Notes on software developmenthttp://notes.eatonphil.com/Notes on software developmenthttp://www.rssboard.org/rss-specificationpython-feedgenenFri, 28 Feb 2025 18:07:45 +0000- Minimal downtime Postgres major version upgrades with EDB Postgres Distributedhttp://notes.eatonphil.com/2025-02-28-minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2025-02-28-minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed.htmlFri, 28 Feb 2025 00:00:00 +0000
- From web developer to database developer in 10 yearshttp://notes.eatonphil.com/2025-02-15-from-web-developer-to-database-developer-in-10-years.html<p>Last month I completed my first year at EnterpriseDB. I'm on the team
that built and maintains
<a href="https://github.com/2ndQuadrant/pglogical">pglogical</a> and who, over
the years, contributed a good chunk of the logical replication
functionality that exists in community Postgres. Most of my work, our
work, is in C and Rust with tests in Perl and Python. Our focus these
days is a descendant of pglogical called <a href="https://www.enterprisedb.com/docs/pgd/latest/">Postgres
Distributed</a> which
supports replicating DDL, tunable consistency across the cluster, etc.</p>
<p>This post is about how I got here.</p>
<h3 id="black-boxes">Black boxes</h3><p>I was a web developer from 2014-2021†. I wrote
JavaScript and HTML and CSS and whatever server-side language: Python
or Go or PHP. I was a hands-on engineering manager from 2017-2021. I
was pretty clueless about databases and indeed database knowledge was
not a serious part of any interview I did.</p>
<p>Throughout that time (2014-2021) I wanted to move my career forward as
quickly as possible so I spent much of my free time doing educational
projects and writing about them on this blog (or previous incarnations
of it). I learned how to write primitive HTTP servers, how to write
little parsers and interpreters and compilers. It was a virtuous cycle
because the internet (Hacker News anyway) liked reading these posts
and I wanted to learn how the black boxes worked.</p>
<p>But I shied away from data structures and algorithms (DSA) because
they seemed complicated and useless to the work that I did. That is,
until 2020 when an inbox page I built started loading more and more slowly as
the inbox grew. My coworker pointed me at <a href="https://use-the-index-luke.com/">Use The Index,
Luke</a> and the DSA scales fell from my
eyes. I wanted to understand this new black box so I <a href="https://notes.eatonphil.com/database-basics.html">built a little
in-memory SQL
database</a> with
support for indexes.</p>
<p>I'm a college dropout so even while I was interested in compilers and
interpreters earlier in my career I never dreamed I could get a job
working on them. Only geniuses and PhDs did that work and I was
neither. The idea of working on a database felt the same. However, I
could work on little database side projects like I had done before on other
topics, <a href="https://notes.eatonphil.com/tags/databases.html">so I
did</a>. Or a <a href="https://notes.eatonphil.com/tags/raft.html">series of
explorations</a> of Raft
implementations, others' and my own.</p>
<h3 id="startups">Startups</h3><p>From 2021-2023 I tried to start <a href="https://github.com/multiprocessio/datastation">a
company</a> and when that
didn't pan out I joined TigerBeetle as a cofounder to work on
marketing and community. It was during this time I started the
<a href="https://eatonphil.com/discord.html">Software Internals Discord</a> and
<a href="https://www.reddit.com/r/databasedevelopment/">/r/databasedevelopment</a>
which have since kind of exploded in popularity among professionals
and academics in database and distributed systems.</p>
<p>TigerBeetle was my first job at a database company, and while I
contributed bits of code I was not a developer there. It was a <a href="https://letters.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html">way
into the
space</a>. And
indeed it was an incredible learning experience both on the cofounder
side and on the database side. I wrote articles with King and Joran
that helped teach and affirm for myself the basics of databases and
consensus-based distributed systems.</p>
<h3 id="holding-out">Holding out</h3><p>When I left TigerBeetle in 2023 I was still not sure if I
could get a job as an actual database developer. My network had
exploded since 2021 (when I started my own company that didn't pan out)
so I had no trouble getting referrals at database companies.</p>
<p>But my background kept leading hiring managers to suggest putting me
on cloud teams doing orchestration in Go <em>around</em> a database rather than
working on the database itself.</p>
<p>I was unhappy with this type-casting
so I held out while unemployed and continued to write posts and <a href="https://eatonphil.com/archive.html">host
virtual hackweeks</a> messing with
Postgres and MySQL. I started the <a href="https://eatonphil.com/2024-database-design-and-implementation.html">first
incarnation</a>
of the Software Internals Book Club during this time, reading
Designing Data Intensive Applications with 5-10 other developers in
Bryant Park. During this time I also started the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">NYC Systems Coffee
Club</a>.</p>
<h3 id="postgres">Postgres</h3><p>After about four months of searching I ended up with three good
offers, all to do C and Rust development on Postgres (extensions) as
an individual contributor. Working on extensions might sound like the
definition of not-sexy, but Postgres APIs are so loosely abstracted
it's really as if you're working on Postgres itself.</p>
<p>You can mess with almost anything in Postgres so you have to be
very aware of what you're doing. And when you can't mess with
something in Postgres because an API doesn't yet exist, companies have
the tendency to just fork Postgres so they can. (This tendency isn't
specific to Postgres, almost every open-source database company seems to have a
long-running internal fork or two of the database.)</p>
<h3 id="enterprisedb">EnterpriseDB</h3><p>Two of the three offers were from early-stage startups and after more
than 3 years being part of the earliest stages of startups I was happy
for a break. But the third offer was from <a href="https://www.enterprisedb.com/blog/Which-Companies-Supporting-PostgreSQL-Development">one of the biggest
contributors</a>
to Postgres, a 20-year old company called EnterpriseDB. (You can probably come up with
different rankings of companies using different metrics so I'm only
saying EnterpriseDB is <em>one</em> of the biggest contributors.)</p>
<p>It seemed like the best place to be to learn a lot and contribute
something meaningful.</p>
<p>My coworkers are a mix of Postgres veterans (people who
contributed the WAL to Postgres, who contributed MVCC to Postgres, who
contributed logical decoding and logical replication, who contributed
parallel queries; the list goes on and on) but also my
developer-coworkers are people who started at EnterpriseDB on
technical support, or who were previously Postgres administrators.</p>
<p>It's quite a mix. Relatively few geniuses or PhDs, despite what I used
to think, but they certainly work hard and have hard-earned
experience.</p>
<p>Anyway, I've now been working at EnterpriseDB for over a year so I
wanted to share this retrospective. I also wanted to cover what it's
like coming from engineering management and founding companies to
going back to being an individual contributor. (Spoiler: incredibly
enjoyable.) But it has been hard enough to make myself write this much so
I'm calling it a day. :)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a post about the winding path I took from web developer to database developer over 10 years. <a href="https://t.co/tf8bUDRzjV">pic.twitter.com/tf8bUDRzjV</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1890817374644826387?ref_src=twsrc%5Etfw">February 15, 2025</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>† From 2011-2014 I also did contract web development but this was
part-time while I was in school.</p>
http://notes.eatonphil.com/2025-02-15-from-web-developer-to-database-developer-in-10-years.htmlSat, 15 Feb 2025 00:00:00 +0000
- Edit for clarityhttp://notes.eatonphil.com/2025-01-29-edit-for-clarity.html<p>I have the fortune to review a
<a href="https://eatonphil.com/editor.html">few</a> important blog posts every year and
the biggest value I add is to call out sentences or sections that make
no sense. It is quite simple and you can do it too.</p>
<p>Without clarity only those at your company in marketing and sales
(whose job it is to work with what they get) will give you the
courtesy of a cursory read and a like on LinkedIn. This is all that
most corporate writing achieves. It is the norm and it is understandable.</p>
<p>But if you want to reach an audience beyond those folks, you have to
make sure you're not writing nonsense. And you, as reviewer and
editor, have the chance to call out nonsense if you can get yourself
to recognize it.</p>
<h3 id="immune-to-nonsense">Immune to nonsense</h3><p>But especially when editing blog posts at work, it is easy to gloss
over things that make no sense because we are so constantly
bombarded by things that make no sense. Maybe it's buzzwords or
cliches, or simply lack of rapport. We become immune to nonsense.</p>
<p>And even worse, without care, as we become more experienced, we become
more fearful to say "I have no idea what you are talking about". We're
afraid to look incompetent by admitting our confusion. This fear is
understandable, but is itself stupid. And I will trust you to deal
with this on your own.</p>
<h3 id="read-it-out-loud">Read it out loud</h3><p>So as you review a post, read it out loud to yourself. And if you find
yourself saying "what on earth are you talking about", add that as a
comment as gently as you feel you should. It is not offensive to say
this (depending on how you say it). It is surely the case that the
author did not know they were making no sense. It is worse to not
mention your confusion and allow the author to look like an idiot or a
bore.</p>
<p>Once you can call out what does not make sense to you, then read the
post again and consider what would not make sense to someone without
the context you have. Someone outside your company. Of course you need
to make assumptions about the audience to a degree. It is likely your
customers or prospects you have in mind. Not your friends or family.</p>
<p>With the audience you have in mind, would what you're reading make
any sense? Has the author given sufficient background or introduced
relevant concepts before bringing up something new?</p>
<p>Again this is a second step though. The first step is to make sure
that the post makes sense to <em>you</em>. In almost every draft I read, at my
company or not, there is something that does not make sense to me.</p>
<p>Do two paragraphs need to be reordered because the first one
accidentally depended on information mentioned in the second? Are you
making ambiguous use of pronouns? And so on.</p>
<h3 id="in-closing">In closing</h3><p>Clarity on its own will put you in the 99th percentile of
writing. Beyond that it definitely still matters if you are compelling and
original and whatnot. But too often it seems we focus on being
exciting rather than being clear. But it doesn't matter if you've got
something exciting if it makes no sense to your reader.</p>
<p>This sounds like mundane guidance, but I have reviewed many posts that
were reviewed by other people and no one else called out nonsense. I
feel compelled to mention how important it is.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a new post on the most important, and perhaps least done, thing you can do while reviewing a blog post: edit for clarity. <a href="https://t.co/ODblOUzB3g">pic.twitter.com/ODblOUzB3g</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1884735729625952692?ref_src=twsrc%5Etfw">January 29, 2025</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2025-01-29-edit-for-clarity.htmlWed, 29 Jan 2025 00:00:00 +0000
- An explosion of transitive dependencieshttp://notes.eatonphil.com/2025-01-25-an-explosion-of-transitive-dependencies.html<p>A small standard library means an explosion in transitive
dependencies. A more comprehensive standard library helps you minimize
dependencies. Don't misunderstand me: in a real-world project, it is
practically impossible to have zero dependencies.</p>
<p>Armin Ronacher
<a href="https://lucumr.pocoo.org/2025/1/24/build-it-yourself/">called</a> for a
vibe shift among programmers and I think that this actually exists
already. Everyone I speak to on this topic has agreed that minimizing
dependencies is ideal.</p>
<p>Rust and JavaScript, with their incredibly minimal standard libraries,
<a href="https://notes.eatonphil.com/2024-03-15-zig-rust-and-other-languages.html#standard-library">work against this
ideal</a>. Go, Python, Java, and C# in contrast have
a decent standard library, which helps minimize the explosion of
transitive dependencies.</p>
<h3 id="examples">Examples</h3><p>I think the standard library should reasonably include:</p>
<ul>
<li>JSON, CSV, and Parquet support</li>
<li>HTTP/2 support (which includes TLS, compression, random number generation, etc.)</li>
<li>Support for asynchronous IO</li>
<li>A logging abstraction</li>
<li>A SQL client abstraction</li>
<li>Key abstract data types (BTrees, hashmaps, sets, and growable arrays)</li>
<li>Utilities for working with Unicode, time and timezones</li>
</ul>
<p>But I don't think it needs to include:</p>
<ul>
<li>Excel support</li>
<li>PostgreSQL or Oracle clients</li>
<li>Flatbuffers support</li>
<li>Niche data structures</li>
</ul>
<p>Neither of these are intended to be complete lists, just examples.</p>
<h3 id="walled-gardens">Walled gardens</h3><p>Minimal standard libraries force growing companies to build out their
own internal collection of "standard libraries". As one example,
Bloomberg <a href="https://github.com/bloomberg/bde/wiki">did this</a> with
C++. And I've heard of companies doing this already with Rust. This
allows larger companies to manage and minimize the explosion of
transitive dependencies over time.</p>
<p>All growing companies likely do something like this eventually. But
again, smaller standard libraries incentivize companies to build this
internal standard library earlier on. And the community benefits
relatively little from these internal standard libraries. The
community would benefit more if large organizations contributed back
to an actual standard library.</p>
<p>Smaller organizations do not have the capacity to build these internal
standard libraries.</p>
<p>Maybe the situation will lead to libraries like Boost for
JavaScript and Rust programmers. That could be fine.</p>
<h3 id="versioning">Versioning</h3><p>A comprehensive standard library does not prevent the
language developers from releasing new versions of the standard
library. It is trivial to do this with naming like Go has done
with the <a href="https://go.dev/blog/v2-go-modules">v2</a>
pattern. <a href="https://go.dev/blog/randv2">math/rand/v2</a> is an example.</p>
<h3 id="conclusion">Conclusion</h3><p>I'm primarily thinking about maintainability, not security. You can
read about the <a href="https://medium.com/@john_25313/c-isnt-a-hangover-rust-isn-t-a-hangover-cure-580c9b35b5ce#:~:text=Rust%20makes%20it,for%20their%20libraries.">security
risks</a>
of using a language with an ecosystem like Rust from someone who is an
expert on the matter.</p>
<p>My concern about the standard library does not stop me from using
Rust and JavaScript. They could choose to invest in the standard
library at any time. We have already begun to see
<a href="https://bun.sh/docs/api/s3">Bun</a> and <a href="https://jsr.io/@std">Deno</a>
to do exactly this. But it is clearly an area
for improvement in Rust and JavaScript. And a mistake for other
languages to avoid repeating.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">While zero dependencies is practically impossible, everyone I've spoken to agrees that minimizing dependencies is ideal. Rust and JavaScript work against this ideal. But they could change at any time. And Bun and Deno are already examples of this.<a href="https://t.co/qkSh6oW1Yd">https://t.co/qkSh6oW1Yd</a> <a href="https://t.co/mY1MNErZG7">pic.twitter.com/mY1MNErZG7</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1883162142888853945?ref_src=twsrc%5Etfw">January 25, 2025</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2025-01-25-an-explosion-of-transitive-dependencies.htmlSat, 25 Jan 2025 00:00:00 +0000
- Embedding Python in Rust (for tests)http://notes.eatonphil.com/2025-01-22-embedding-python-rust-tests.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/embedding-python-rust-tests'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/embedding-python-rust-tests">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2025-01-22-embedding-python-rust-tests.htmlWed, 22 Jan 2025 00:00:00 +0000
- Logical replication in Postgres: Basicshttp://notes.eatonphil.com/2025-01-17-logical-replication-postgres-basics.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/logical-replication-postgres-basics'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/logical-replication-postgres-basics">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2025-01-17-logical-replication-postgres-basics.htmlFri, 17 Jan 2025 00:00:00 +0000
- How I run a coffee clubhttp://notes.eatonphil.com/2024-12-31-how-i-run-a-coffee-club.html<p>I started the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">NYC Systems Coffee
Club</a> in December
of 2023. It's gone pretty well! I regularly get around 20 people each
month. You bring a drink if you feel like it and you hang out with
people for an hour or two.</p>
<p>There is no agenda, there is no speaker, there is no structure. The
only "structure" is that when the circle of people talking to each
other seems gets too big, I break the circle up into two smaller circles so
we can get more conversations going.</p>
<p><img src="/assets/coffeeclub.png" alt="/assets/coffeeclub.png"></p>
<p>People tend to talk in a little circle and then move around over
time. It's basically no different than a happy hour except it is over
a non-alcoholic drink and it's in the morning.</p>
<p>All I have to do as the organizer is periodically tell people about
the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">Google Form</a>
to fill out. I got people to sign up to the list by posting about this
on Twitter and LinkedIn. And then once a month I send an email bcc-ing
everyone on the list and ask them to respond for an invite.</p>
<p><img src="/assets/coffeeclub-invite.png" alt="/assets/coffeeclub-invite.png"></p>
<p>The first 20 people to respond get a calendar invite.</p>
<p><img src="/assets/coffee-club-invite.png" alt="/assets/coffeeclub-invite.png"></p>
<p>I mention all of this because people ask how they can start a coffee
club in their city. They ask how it works. But it's very simple! One
of the least-effortful ways to bring together people in your city.</p>
<p>If your city does not have indoor public spaces, you could use a
food court, or a cafe, or a park during months where it is warm.</p>
<p>For example, the <a href="https://blinsay.com/chc3/">Cobble Hill Computer Coffee
Club</a> is one that meets outdoors at a park.</p>
<p>Good luck! :)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">How I run a coffee club, a short guide for others who might be interested in running one. It's very simple!<a href="https://t.co/UgRWDQOA3v">https://t.co/UgRWDQOA3v</a> <a href="https://t.co/5wYrLW7u6D">pic.twitter.com/5wYrLW7u6D</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1874213922271879650?ref_src=twsrc%5Etfw">December 31, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-12-31-how-i-run-a-coffee-club.htmlTue, 31 Dec 2024 00:00:00 +0000
- Picking up volleyball in NYC with Goodrec and New York Urbanhttp://notes.eatonphil.com/2024-12-26-volleyball-in-nyc.html<p>I was so intimidated to go at first, but it is in fact easy and fun to
start playing beginner volleyball in New York. The people are so
friendly and welcoming that it has been easy to keep playing
consistently every week since I started for the first time this
August. It's been a great workout and a great way to make friends!</p>
<p>The two platforms I've used to find volleyball games are
<a href="https://www.goodrec.com/">Goodrec</a> and <a href="https://www.nyurban.com/">New York
Urban</a>. While these platforms may also offer
classes and leagues, I mostly use them to play "pickup" games. Pickup
games are where you show up and join (or get assigned to) a team to
play for an hour or two. Easy to go on your own or with friends.</p>
<p>I'm not an expert! My only hope with this post is that maybe it makes
trying out volleyball in New York feel a little less intimidating for
you!</p>
<h3 id="goodrec">Goodrec</h3><p>With Goodrec you have to use their mobile app. Beginner tier is called
"social" on Goodrec. So browse available games until you find one at
the level you want to play. You enroll in (buy a place in) sessions
individually.</p>
<p>Sessions are between 90-120 minutes long.</p>
<p><img src="/assets/goodrec-social.png" alt="/assets/goodrec-social.png"></p>
<p>They ask you not to arrive more than 10 minutes early at the gym. When
you arrive you tell the gym managers (usually in a desk up front
somewhere) you're there for Goodrec and the tier (in case the gym has
multiple level games going on at the same time). Then you wait until
the Goodrec "host" arrives and they will organize everyone into
teams.</p>
<p>Goodrec hosts are players who volunteer to organize the games. They'll
explain the rules of the game (makes Goodrec very good for beginners)
and otherwise help you out.</p>
<p>Always say thank you to your host!</p>
<h3 id="new-york-urban">New York Urban</h3><p>With New York Urban, pickup sessions are called <a href="https://www.nyurban.com/open-play-volleyball">"open
play"</a>.</p>
<p>There is no mobile app, you just use the website to purchase a spot in
a session. The sessions are longer and cheaper than Goodrec. But there
is no host; players self-organize.</p>
<p>The options are more limited too. You play at one of four high schools
on either a Friday night or on Sunday. And session slots tend to sell
out much more quickly than with Goodrec.</p>
<p><img src="/assets/nyurban-beginner.png" alt="/assets/nyurban-beginner.png"></p>
<h3 id="big-city-volleyball">Big City Volleyball</h3><p>You can also check out <a href="https://bigcityvolleyball.com/">Big City
Volleyball</a> but I haven't used it yet.</p>
<h3 id="volo">Volo</h3><p>I haven't ever done Volo but I think I've heard it described as "beer
league". That even some of the beginner tier sessions with Goodrec and
New York Urban are more competitive.</p>
<p>But also, Volo is built around leagues so you have to get the timing
right. Goodrec's and New York Urban's pickup games make it easy to get
started playing any time of year.</p>
<h3 id="making-friends">Making friends</h3><p>It was super awkward to go at first! I went by myself. I didn't know
what I was doing. I couldn't remember, and didn't know, many rules. I
didn't have court shoes or knee pads.</p>
<p>But the Goodrec host system is particularly great for bringing
beginners in and making them feel welcome. You have a great time even
if you're terrible.</p>
<p>The first game I went to, I tried to hang out afterward to meet people.
But people either came with their SO or with their friends or by
themselves so they all just left immediately or hung out in their
group.</p>
<p>So you can't just go once and expect to make friends immediately. But
if you keep going at the same place and time regularly week over week,
you'll see familiar faces. Maybe half the people I play with each
week are regulars. If you're friendly you'll start making friends with
these people and eventually start going out to bars with them after
the games.</p>
<h3 id="improving">Improving</h3><p>Even if you find yourself embarrassingly bad at first, just keep
going! I'm 29, 6'1, 190lbs and from observation the past 5 months,
age, height, and weight have a very indirect relation to playing
ability.</p>
<p>Most of the people who play are self-taught, especially at the lower
tiers I've played at. But some people played for the school team in
high school or college. These people are fun to play with and you can
learn a lot from them.</p>
<p>Most people who are self-taught seem to watch YouTube videos like
<a href="https://www.youtube.com/channel/UCoEMagRUvrXELuJZwS4DevA">Coach
Donny</a>,
helpful for learning how to serve, set, block, etc. Or they take
"clinics" (classes) with Goodrec or other platforms. (I have no idea
about these, I've never done them before.)</p>
<p>At first I played 2 hours a week and I was completely exhausted after
the session. Over time it got easier so I started playing 2-3 sessions
a week (6-9-ish hours). With practice and consistency (after about 3-4
months), I started playing Intermediate tier with Goodrec and New York
Urban. And I don't think I'll play Beginner/Social at all anymore.</p>
<p>I still primarily play for fun and for the workout and to meet
people. But it's also fun to get better!</p>
<p>I played with one person much better than myself in an Intermediate
session one time and he mentioned he will probably stop playing
Intermediate and only play High Intermediate. He mentioned you get
better when you keep pushing yourself to play with better and better
players. Good advice!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a little post on picking up volleyball in new york.<br><br>It's fun, and a great workout, and you meet interesting people!<a href="https://t.co/jEWHbRWF6C">https://t.co/jEWHbRWF6C</a> <a href="https://t.co/ipuIUB1ZnM">pic.twitter.com/ipuIUB1ZnM</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1872394142212661250?ref_src=twsrc%5Etfw">December 26, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-12-26-volleyball-in-nyc.htmlThu, 26 Dec 2024 00:00:00 +0000
- 1 million page viewshttp://notes.eatonphil.com/2024-11-28-1-million-views.html<p>I was delighted to notice this morning that this site has recently
passed 1M page views. And since Murat
<a href="https://muratbuffalo.blogspot.com/2017/02/1-million-pageviews.html">wrote</a>
about his 1M page view accomplishment at the time, I felt compelled to
now too.</p>
<p><img src="/assets/1m-page-views.png" alt="/assets/1m-page-views.png"></p>
<p>I started regularly blogging in 2018. For some reason I decided to
write a blog post every month. And while I have definitely skipped a
month or two here or there, on average I've written 2 posts per month.</p>
<h3 id="tooling">Tooling</h3><p>Since at least 2018 this site has been built with a static site
generator. I might have used a 3rd-party generator at one point, but
for as long as I can remember most of this site has been built with a
<a href="https://github.com/eatonphil/eatonphil.com/blob/main/notes/scripts/build.py">little Python
script</a>
I wrote.</p>
<p>I used to get so pissed when static site generators would pointlessly
change their APIs and I'd have to make pointless changes. I have not
had to make any significant changes to my build code in many years.</p>
<p>I hosted the site itself on GitHub Pages for many years. But I wanted
more flexibility with subdomains (ultimately not something I liked)
and the ability to view server-side logs (ultimately not something I
ever do).</p>
<p>I think this site is hosted on an OVH machine now. But at this point
it is inertia keeping me there. If you have no strong feelings
otherwise, GitHub Pages is perfect.</p>
<p>I used to use Google Analytics but then they shut down the old
version. The new version was incredibly confusing to use. I could not
find some very basic information. So I moved to Fathom which has been
great.</p>
<p>I used to track all subscribers in a Google Form and bcc them but this
became untenable eventually after 1000 subscribers due to GMail rate
limits. I currently use MailerLite for subscriptions and sending email
about new posts. But this is an absolutely terrible service. They
proxy all links behind a domain that adblockers hate and they also
visually shorten the URL so you can't copy the text of the URL.</p>
<p>I just want a service that has a hosted form for collecting
subscribers and a <code><textarea></code> that lets me dump raw HTML and send
that as an email to my subscribers. No branding, no watermarks, no
link proxying. This apparently doesn't exist. I am too lazy to
figure out Amazon SES so I stick with MailerLite for now.</p>
<h3 id="evolution">Evolution</h3><p>In the beginning I talked about little interpreters in JavaScript,
about programming languages, about Scheme. I was into functional
programming. Over time I moved into little emulators and bytecode
VMs. And for the last four years I became obsessed with databases and
distributed systems.</p>
<p>I have almost always written about little projects to teach myself a
concept. Writing a <a href="https://notes.eatonphil.com/lua-in-rust.html">bytecode VM in
Rust</a>, <a href="https://notes.eatonphil.com/emulating-amd64-starting-with-elf.html">emulating a
subset of x86 in
Go</a>,
<a href="https://notes.eatonphil.com/2023-05-25-raft.html">implementing Raft in
Go</a>, <a href="https://notes.eatonphil.com/2024-05-16-mvcc.html">implementing
MVCC isolation levels in
Go</a>, and so on.</p>
<p>So many times when I tried to learn a concept I would find blog posts
with only partial code. The post would link to a GitHub repo that, by
the time I got to the post, had evolved significantly beyond what was
described in the post. The repo code had by then become too complex
for me to follow. So I was motivated to write minimal implementations
and walk through the code in its entirety.</p>
<div class="note">
Even today there is not a single post on implementing TCP/IP from
scratch that walks through entirely working code. (Please, someone
write this.)
</div><p>I have also had a blast writing survey posts such as <a href="https://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.html">how various
databases execute
expressions</a>,
<a href="https://notes.eatonphil.com/javascript-implementations.html">analyzing non-V8 JavaScript
implementations</a>,
<a href="https://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.html">how various programming language implementations parse
code</a>,
and <a href="https://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html">how various database systems build on top of key-value
databases</a>.</p>
<p>The last two posts have even each been cited in a research paper
(<a href="https://arxiv.org/pdf/2208.08235">here</a> and
<a href="https://www.usenix.org/system/files/atc23-kaufman.pdf">here</a>).</p>
<h3 id="editing">Editing</h3><p>In terms of quality, my single greatest trick is to read the post out
loud. Multiple times. Notice parts that are awkward or unclear and
rewrite them.</p>
<p>My second greatest trick is to ask friends for review. Some posts like
<a href="https://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html">an intuition for distributed
consensus</a>
and <a href="https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html">a write-ahead log is not a universal part of
durability</a>
would simply not have been correct or credible without my fantastic
reviewers. And I'm proud to have <a href="https://eatonphil.com/editor.html">played that
part</a> a few times in turn.</p>
<p>We also have a fantastic #writing-and-drafts channel on the <a href="https://eatonphil.com/discord.html">Software
Internals Discord</a> where folks
(myself occasionally included) come for post review.</p>
<h3 id="context">Context</h3><p>I've lost count of the total number of times that these posts have
been on the front page of Hacker News or that a tweet announcing a
post has reached triple digits likes. I think I've had 9 posts on the
front of HN this year. I do know that my single best year for HN was
12 months between 2022-2023 where 20 of my posts or projects were on
the front page.</p>
<p>Every time a post does well there's a part of me that worries that
I've peaked. But the way to deal with this has been to ignore that
little voice and to just keep learning new things. I haven't stopped
finding things confusing yet, and <a href="https://notes.eatonphil.com/2024-06-14-confusion-is-a-muse.html">confusion is a phenomenal
muse</a>.</p>
<p>And also to, like, go out and meet friends for dinner,
<a href="https://nycsystems.xyz/">run</a>
<a href="https://eatonphil.com/nyc-systems-coffee-club.html">meetups</a>, run <a href="https://eatonphil.com/bookclub.html">book clubs</a>,
<a href="https://eatonphil.com/chat.html">chat</a> with you fascinating internet
strangers, play volleyball, and so on.</p>
<p>It's always been about <a href="https://notes.eatonphil.com/2024-08-24-obsession.html">cultivating healthy
obsessions</a>.</p>
<h3 id="benediction">Benediction</h3><p>In parting, I'll remind you:</p>
<ul>
<li><a href="https://notes.eatonphil.com/is-it-worth-writing-about.html">It is definitely worth writing about</a>,
whatever "it" is</li>
<li><a href="https://twitter.com/eatonphil/status/1854965419745972394">You're not writing enough</a></li>
<li>And <a href="https://eatonphil.com/call-for-posts.html">some ideas for posts I want to hear about if you write about them</a></li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a little reflection on writing after noticing I passed 1M page views this morning.<a href="https://t.co/eIlMDVHNht">https://t.co/eIlMDVHNht</a> <a href="https://t.co/EKSiiDUz5G">pic.twitter.com/EKSiiDUz5G</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1862174926104318407?ref_src=twsrc%5Etfw">November 28, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-11-28-1-million-views.htmlThu, 28 Nov 2024 00:00:00 +0000
- Active and influential NYC infrastructure peoplehttp://notes.eatonphil.com/2024-11-15-active-nyc-infrastructure-people.html<p>These are some of the most influential (mostly due to experience or
expertise) and active folks (I actually see them attend events) in the
NYC infrastructure scene (that I have a personal connection to).</p>
<p>If you're running a dinner or are just looking to meet interesting
people in NYC in software infrastructure, consider this list and feel
free to mention "Phil said you are awesome".</p>
<p>I've normalized titles a little bit but I say every title in the most
generous way. These folks are brilliant.</p>
<p>This list is intentionally randomized. Also not a complete list. I've
surely forgotten (let alone not yet met) great folk.</p>
<ul>
<li><a href="https://www.linkedin.com/in/parkertimmerman/">Parker Timmerman</a>, developer</li>
<li><a href="https://www.linkedin.com/in/mottaqui-karim/">Taq Karim</a>, director of engineering</li>
<li><a href="https://malloc.dog/about/">Peixian Wang</a>, developer</li>
<li><a href="https://www.linkedin.com/in/sujayakar/">Sujay Jayakar</a>, chief scientist</li>
<li><a href="https://www.linkedin.com/in/pauldix/">Paul Dix</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/angelo-saraceno/">Angelo Saraceno</a>, developer</li>
<li><a href="https://www.linkedin.com/in/taylor-baldwin-642b4818/">Taylor Baldwin</a>, cto</li>
<li><a href="https://www.linkedin.com/in/blinsay/">Ben Linsay</a>, cto</li>
<li><a href="https://www.linkedin.com/in/nicholasursa/">Nicholas Ursa</a>, developer</li>
<li><a href="https://www.linkedin.com/in/samgross/">Sam Gross</a>, developer</li>
<li><a href="https://www.linkedin.com/in/tramale-turner-31b24a/">Tramale Turner</a>, vp of engineering</li>
<li><a href="https://www.linkedin.com/in/justinjaffray/">Justin Jaffray</a>, developer</li>
<li><a href="https://www.linkedin.com/in/kwosei/">Kojo Osei</a>, vc</li>
<li><a href="https://www.linkedin.com/in/bryanrussett/">Bryan Russett</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/guilload/">Adrien Guillo</a>, cofounder</li>
<li><a href="https://www.linkedin.com/in/thiagoghisi/">Thiago Ghisi</a>, director of engineering</li>
<li><a href="https://www.linkedin.com/in/gilbert-forsyth-1a368240/">Gil Forsyth</a>, developer</li>
<li><a href="https://www.linkedin.com/in/dan-fried-57b0178/">Dan Fried</a>, cto</li>
<li><a href="https://www.linkedin.com/in/davidagolden/">David Golden</a>, director of engineering</li>
<li><a href="https://www.linkedin.com/in/akshat-bubna-188885103/">Akshat Bubna</a>, cto</li>
<li><a href="https://www.linkedin.com/in/andrew-werner-8228a438/">Andrew Werner</a>, cofounder</li>
<li><a href="https://www.linkedin.com/in/voberoi/">Vikram Oberoi</a>, founder</li>
<li><a href="https://www.linkedin.com/in/samkottler/">Sam Kottler</a>, developer</li>
<li><a href="https://www.linkedin.com/in/jordanthelewis/">Jordan Lewis</a>, director of engineering</li>
<li><a href="https://www.linkedin.com/in/mykolakurutin/">Mykola Kurutin</a>, engineering manager</li>
<li><a href="https://www.linkedin.com/in/paulormg/">Paulo Motta</a>, developer</li>
<li><a href="https://www.linkedin.com/in/priyanka-somrah/">Priyanka Somrah</a>, vc</li>
<li><a href="https://www.linkedin.com/in/jzelinskie/">Jimmy Zelinskie</a>, cpo</li>
<li><a href="https://www.linkedin.com/in/vy-ton/">Vy Ton</a>, product manager</li>
<li><a href="https://www.linkedin.com/in/viega/">John Viega</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/benburkert/">Ben Burkert</a>, cto</li>
<li><a href="https://www.linkedin.com/in/petevilter/">Pete Vilter</a>, developer</li>
<li><a href="https://www.linkedin.com/in/seanloiselle/">Sean Loiselle</a>, developer</li>
<li><a href="https://www.linkedin.com/in/rahul-lath/">Rahul Lath</a>, vp of engineering</li>
<li><a href="https://www.linkedin.com/in/kelleymak/">Kelley Mak</a>, vc</li>
<li><a href="https://www.linkedin.com/in/ramrengaswamy/">Ram Kumar Rengaswamy</a>, cofounder</li>
<li><a href="https://www.linkedin.com/in/oridb/">Ori Bernstein</a>, consultant</li>
<li><a href="https://www.linkedin.com/in/mitchsw/">Mitch Ward</a>, director of engineering</li>
<li><a href="https://www.linkedin.com/in/philippemnoel/">Philippe Noël</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/paulgb/">Paul Butler</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/mathable/">Abel Mathew</a>, cofounder</li>
<li><a href="https://www.linkedin.com/in/apacker/">Andrew Packer</a>, developer</li>
<li><a href="https://www.linkedin.com/in/clipperhouse/">Matt Sherman</a>, engineering manager</li>
<li><a href="https://www.linkedin.com/in/seshendranalla/">Sesh Nalla</a>, director of engineering</li>
<li><a href="https://www.linkedin.com/in/andrei-matei-9401083/">Andrei Matei</a>, cofounder</li>
<li><a href="https://www.linkedin.com/in/ryanmwexler/">Ryan Wexler</a>, vc</li>
<li><a href="https://www.linkedin.com/in/alexkesling/">Alex Kesling</a>, cto</li>
<li><a href="https://www.linkedin.com/in/larrytheliquid/">Larry Diehl</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/will-manning-maker-of-things/">Will Manning</a>, ceo</li>
<li><a href="https://www.linkedin.com/in/paul-nowoczynski-42a5267/">Paul Nowoczynski</a>, founder</li>
<li><a href="https://www.linkedin.com/in/alexsarkesian/">Alex Sarkesian</a>, developer</li>
<li><a href="https://www.linkedin.com/in/meganalicereynolds/">Megan Reynolds</a>, vc</li>
<li><a href="https://www.linkedin.com/in/nikhilbenesch/">Nikhil Benesch</a>, cto</li>
<li><a href="https://www.linkedin.com/in/saleh-hindi/">Saleh Hindi</a>, founder</li>
<li><a href="https://www.linkedin.com/in/stephaniewang526/">Stephanie Wang</a>, developer</li>
<li><a href="https://www.linkedin.com/in/just-be/">Justin Bennett</a>, cofounder</li>
<li><a href="https://www.linkedin.com/in/evanmarkschwartz/">Evan Schwartz</a>, developer</li>
<li><a href="https://www.linkedin.com/in/ekzhang/">Eric Zhang</a>, developer</li>
</ul>
http://notes.eatonphil.com/2024-11-15-active-nyc-infrastructure-people.htmlFri, 15 Nov 2024 00:00:00 +0000
- Exploring Postgres's arena allocator by writing an HTTP server from scratchhttp://notes.eatonphil.com/2024-11-06-exploring-postgress-arena-allocator-writing-http-server-scratch.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/exploring-postgress-arena-allocator-writing-http-server-scratch'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/exploring-postgress-arena-allocator-writing-http-server-scratch">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2024-11-06-exploring-postgress-arena-allocator-writing-http-server-scratch.htmlWed, 06 Nov 2024 00:00:00 +0000
- Effective unemployment and social mediahttp://notes.eatonphil.com/2024-11-05-effective-unemployment-and-social-media.html<p>Being unemployed can be incredibly depressing. So much
rejection. Everything seems to be out of your control. Everything
except for one thing: what you produce.</p>
<p>You might know that repeatedly posting on social media that you are
looking for work is ineffective. That it looks (or at least feels)
worse each time you say so. But there is at least one major caveat to
this.</p>
<p>Every single time you create something and share it publicly is a
chance to also reiterate that you are looking for work. And people
actually appreciate and value this!</p>
<p>Whether you write a blog post or build some project, you are seen as
working on yourself and contributing to the community. Positive
things! And it is no problem at all to learn with each new post you
write and each new project you publish that you are also looking for
work.</p>
<p>Moreover, dynamics of the internet and social media basically require
that you be regularly producing something new. Either regularly
producing a new version of some existing project or regularly
producing new projects (or blog posts) entirely.</p>
<p>What you did a week ago is old news on social media. What will you do
next week?</p>
<p>This could itself feel depressing except for that it's probably
actually a fairly healthy thing for yourself anyway! It is a
motivation to keep your skills sharp as time goes on.</p>
<p>So while you're unemployed and able to muster the motivation, write
about things that are interesting to you! Build projects that intrigue
you. Leave a little note on every post and project that you are
looking for work. And share every post and project on social media.</p>
<p>You'll expose yourself to opportunities and referrals. And even if no
post or project "takes off" you will still be working on yourself and
contributing back knowledge to the community.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short post on some ideas for effective unemployment and social media.<a href="https://t.co/jmiJCOe2Nk">https://t.co/jmiJCOe2Nk</a> <a href="https://t.co/pK9AySNdHR">pic.twitter.com/pK9AySNdHR</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1853800075564109880?ref_src=twsrc%5Etfw">November 5, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-11-05-effective-unemployment-and-social-media.htmlTue, 05 Nov 2024 00:00:00 +0000
- Checking linearizability in Gohttp://notes.eatonphil.com/2024-10-31-checking-linearizability-in-go.html<p><!-- -*- mode: markdown -*- --></p>
<p>You want to check for strict consistency
(<a href="https://jepsen.io/consistency/models/linearizable">linearizability</a>)
for your project but you don't want to have to <a href="https://github.com/jepsen-io/">deal with the
JVM</a>. <a href="https://github.com/anishathalye/porcupine">Porcupine</a>,
used by a number of real-world systems like etcd and TiDB, has you
covered!</p>
<p>Importantly, neither Jepsen projects nor Porcupine can <em>prove</em>
linearizability. They can only help you <em>build confidence</em> that you
aren't obviously <em>violating</em> linearizability.</p>
<p>The Porcupine README is pretty good but doesn't give complete working
code, so I'm going to walk through checking linearizability of a
distributed register. And then we'll tweak things a bit by checking
linearizability for a distributed key-value store.</p>
<p>But rather than implementing a distributed register and implementing a
distributed key-value store, to keep this post concise, we're just
going to imagine that they exist and we'll come up with some example
histories we might see.</p>
<p>Code for this post can be found on
<a href="https://github.com/eatonphil/linearizability-playground">GitHub</a>.</p>
<h3 id="boilerplate">Boilerplate</h3><p>Create a new directory and <code>go mod init lintest</code>. Let's add the
imports we need and a helper function for generating a visualization
of a history, in <code>main.go</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"os"</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"log"</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"github.com/anishathalye/porcupine"</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">visualizeTempFile</span><span class="p">(</span><span class="nx">model</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Model</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">LinearizationInfo</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">file</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">CreateTemp</span><span class="p">(</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"*.html"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"failed to create temp file"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Visualize</span><span class="p">(</span><span class="nx">model</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="p">,</span><span class="w"> </span><span class="nx">file</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"visualization failed"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"wrote visualization to %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">file</span><span class="p">.</span><span class="nx">Name</span><span class="p">())</span>
<span class="p">}</span>
</pre></div>
<h3 id="a-distributed-register">A distributed register</h3><p>A distributed register is like a distributed key-value store but
there's only a single key.</p>
<p>We need to tell Porcupine what the inputs and outputs for this system
are. And we'll later describe for it how an idealized version of this
system should behave as it receives each input; what output the
idealized version should produce.</p>
<p>Each time we send a command to the distributed register it will
include an operation (to get or to set the register). And if it is a
set command it will include a value.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">registerInput</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="c1">// "get" and "set"</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">int</span>
<span class="p">}</span>
</pre></div>
<p>The register is an integer register.</p>
<p>Now we will define a model for Porcupine which, again, is the
idealized version of this system.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">registerModel</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">porcupine</span><span class="o">.</span><span class="n">Model</span><span class="p">{</span>
<span class="w"> </span><span class="n">Init</span><span class="p">:</span><span class="w"> </span><span class="k">func</span><span class="p">()</span><span class="w"> </span><span class="n">any</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">Step</span><span class="p">:</span><span class="w"> </span><span class="k">func</span><span class="p">(</span><span class="n">stateAny</span><span class="p">,</span><span class="w"> </span><span class="n">inputAny</span><span class="p">,</span><span class="w"> </span><span class="n">outputAny</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb nb-Type">bool</span><span class="p">,</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">input</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">inputAny</span><span class="o">.</span><span class="p">(</span><span class="n">registerInput</span><span class="p">)</span>
<span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">outputAny</span><span class="o">.</span><span class="p">(</span><span class="nb nb-Type">int</span><span class="p">)</span>
<span class="w"> </span><span class="n">state</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">stateAny</span><span class="o">.</span><span class="p">(</span><span class="nb nb-Type">int</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">input</span><span class="o">.</span><span class="n">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"set"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">true</span><span class="p">,</span><span class="w"> </span><span class="n">input</span><span class="o">.</span><span class="n">value</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">input</span><span class="o">.</span><span class="n">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"get"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">readCorrectValue</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">state</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">readCorrectValue</span><span class="p">,</span><span class="w"> </span><span class="n">state</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="s2">"Unexpected operation"</span><span class="p">)</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>The step function accepts anything because it has to be able to model
any sort of system with its different inputs and outputs and current
state. So we have to handle casting from the <code>any</code> type to what we
know are the inputs and outputs and state. And finally we actually do
the state change and return the new state as well as if the given
output matches what we know it should be.</p>
<h3 id="an-invalid-history">An invalid history</h3><p>Now we've only defined the idealized version of this system. Let's
pretend we have some real-world implementation of this. We might have
two clients and they might issue concurrent get and set requests.</p>
<p>Every time we stimulate the system we will generate a new history that
we can validate with Porcupine against our model to see if the history
is linearizable.</p>
<p>Let's imagine these two clients concurrently set the register to some
value. Both sets succeed. Then both clients read the register. And
they get different values. Here's what that history would look like
modeled for Porcupine.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ops</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Operation</span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Client 3 sets the register to 100. The request starts at t0 and ends at t2.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="cm">/* end state at t2 is 100 */</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 5 sets the register to 200. The request starts at t3 and ends at t4.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="cm">/* end state at t3 is 200 */</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 3 reads the register. The request starts at t5 and ends at t6.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn't matter */</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 5 reads the register. The request starts at t7 and ends at t8. Reads a stale value!</span>
<span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn't matter */</span><span class="p">},</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">CheckOperationsVerbose</span><span class="p">(</span><span class="nx">registerModel</span><span class="p">,</span><span class="w"> </span><span class="nx">ops</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">visualizeTempFile</span><span class="p">(</span><span class="nx">registerModel</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"expected operations to be linearizable"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>If we build and run this code:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="k">mod</span><span class="w"> </span><span class="n">tidy</span>
<span class="k">go</span><span class="err">:</span><span class="w"> </span><span class="n">finding</span><span class="w"> </span><span class="k">module</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">package</span><span class="w"> </span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">anishathalye</span><span class="o">/</span><span class="n">porcupine</span>
<span class="k">go</span><span class="err">:</span><span class="w"> </span><span class="k">found</span><span class="w"> </span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">anishathalye</span><span class="o">/</span><span class="n">porcupine</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">anishathalye</span><span class="o">/</span><span class="n">porcupine</span><span class="w"> </span><span class="n">v0</span><span class="mf">.1.6</span>
<span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="n">build</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">lintest</span>
<span class="mi">2024</span><span class="o">/</span><span class="mi">10</span><span class="o">/</span><span class="mi">31</span><span class="w"> </span><span class="mi">19</span><span class="err">:</span><span class="mi">54</span><span class="err">:</span><span class="mi">08</span><span class="w"> </span><span class="n">wrote</span><span class="w"> </span><span class="n">visualization</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="o">/</span><span class="nf">var</span><span class="o">/</span><span class="n">folders</span><span class="o">/</span><span class="n">cb</span><span class="o">/</span><span class="n">v27m749d0sj89h9ydfq0f0940000gn</span><span class="o">/</span><span class="n">T</span><span class="o">/</span><span class="mf">463308000.</span><span class="n">html</span>
<span class="nl">panic</span><span class="p">:</span><span class="w"> </span><span class="n">expected</span><span class="w"> </span><span class="n">operations</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">linearizable</span>
<span class="n">goroutine</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">[</span><span class="n">running</span><span class="o">]</span><span class="err">:</span>
<span class="n">main</span><span class="p">.</span><span class="n">main</span><span class="p">()</span>
<span class="w"> </span><span class="o">/</span><span class="n">Users</span><span class="o">/</span><span class="n">phil</span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">lintest</span><span class="o">/</span><span class="n">main</span><span class="p">.</span><span class="k">go</span><span class="err">:</span><span class="mi">59</span><span class="w"> </span><span class="o">+</span><span class="mh">0x394</span>
</pre></div>
<p>Porcupine caught the stale value. Open that HTML file to see
the visualization.</p>
<p><img src="/assets/bad-register-history.png" alt="/assets/bad-register-history.png"></p>
<h3 id="a-valid-history">A valid history</h3><p>Let's say we fix the bug so now there's no stale read. The new history
would look like this:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ops</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Operation</span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Client 3 sets the register to 100. The request starts at t0 and ends at t2.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="cm">/* end state at t2 is 100 */</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 5 sets the register to 200. The request starts at t3 and ends at t4.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="cm">/* end state at t3 is 200 */</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 3 reads the register. The request starts at t5 and ends at t6.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn't matter */</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 5 reads the register. The request starts at t7 and ends at t8.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn't matter */</span><span class="p">},</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Rebuild, rerun <code>lintest</code> (it should exit successfully now), and open
the visualization.</p>
<p><img src="/assets/good-register-history.png" alt="/assets/good-register-history.png"></p>
<p>Great! Now let's make things a little more complicated by modeling a
distributed key-value store rather than a distributed register.</p>
<h3 id="distributed-key-value">Distributed key-value</h3><p>The inputs of this system will be slightly more complex. They will
take a <code>key</code> along with the <code>operation</code> and <code>value</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">kvInput</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="c1">// "get" and "set"</span>
<span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">int</span>
<span class="p">}</span>
</pre></div>
<p>And when we model the distributed key-value store with the state and
output at each step being a <code>map[string]int</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">kvModel</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">porcupine</span><span class="p">.</span><span class="n">Model</span><span class="err">{</span>
<span class="w"> </span><span class="nl">Init</span><span class="p">:</span><span class="w"> </span><span class="n">func</span><span class="p">()</span><span class="w"> </span><span class="ow">any</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="err">{}</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">Step</span><span class="p">:</span><span class="w"> </span><span class="n">func</span><span class="p">(</span><span class="n">stateAny</span><span class="p">,</span><span class="w"> </span><span class="n">inputAny</span><span class="p">,</span><span class="w"> </span><span class="n">outputAny</span><span class="w"> </span><span class="ow">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">bool</span><span class="p">,</span><span class="w"> </span><span class="ow">any</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">input</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">inputAny</span><span class="p">.(</span><span class="n">kvInput</span><span class="p">)</span>
<span class="w"> </span><span class="k">output</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">outputAny</span><span class="p">.(</span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="p">)</span>
<span class="w"> </span><span class="k">state</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">stateAny</span><span class="p">.(</span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">input</span><span class="p">.</span><span class="k">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="ss">"set"</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">newState</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="err">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="k">state</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">newState</span><span class="o">[</span><span class="n">k</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">newState</span><span class="o">[</span><span class="n">input.key</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">input</span><span class="p">.</span><span class="k">value</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">true</span><span class="p">,</span><span class="w"> </span><span class="n">newState</span>
<span class="w"> </span><span class="err">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">input</span><span class="p">.</span><span class="k">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="ss">"get"</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">readCorrectValue</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="k">output</span><span class="o">[</span><span class="n">input.key</span><span class="o">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="k">state</span><span class="o">[</span><span class="n">input.key</span><span class="o">]</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">readCorrectValue</span><span class="p">,</span><span class="w"> </span><span class="k">state</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="ss">"Unexpected operation"</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span><span class="p">,</span>
<span class="w"> </span><span class="err">}</span>
</pre></div>
<p>And now the history gets slightly more complex because we are now
working with some specific key. But we'll otherwise use the same
history as before.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ops</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Operation</span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Client 3 set key `a` to 100. The request starts at t0 and ends at t2.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">"a"</span><span class="p">:</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">2</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 5 set key `a` to 200. The request starts at t3 and ends at t4.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">"a"</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">4</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 3 read key `a`. The request starts at t5 and ends at t6.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn't matter */</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">"a"</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">6</span><span class="p">},</span>
<span class="w"> </span><span class="c1">// Client 5 read key `a`. The request starts at t7 and ends at t8.</span>
<span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn't matter */</span><span class="p">},</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">"a"</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Build and run. Open the visualization.</p>
<p><img src="/assets/good-kv-history.png" alt="/assets/good-kv-history.png"></p>
<p>And there we go!</p>
<h3 id="what's-next">What's next</h3><p>These are just a few simple examples that are not hooked up to a real
system. But it still seemed useful to show how you model one or two
simple different systems and check a history with Porcupine.</p>
<p>Another aspect of Porcupine I did not cover is partitioning the state
space. The
<a href="https://pkg.go.dev/github.com/anishathalye/porcupine#Model">docs</a>
say:</p>
<blockquote><p>Implementing the partition functions can greatly improve
performance. If you're implementing the partition function, the
model Init and Step functions can be per-partition. For example, if
your specification is for a key-value store and you partition by
key, then the per-partition state representation can just be a
single value rather than a map.</p>
</blockquote>
<p>Perhaps that, and hooking this up to some "real" system, would be a
good next step.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short tutorial on using Porcupine to check for linearizability (without needing to deal with the JVM).<a href="https://t.co/kqeBz2jX76">https://t.co/kqeBz2jX76</a> <a href="https://t.co/teXvlp2zcv">pic.twitter.com/teXvlp2zcv</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1852143540131844109?ref_src=twsrc%5Etfw">November 1, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-10-31-checking-linearizability-in-go.htmlThu, 31 Oct 2024 00:00:00 +0000
- Build a serverless ACID database with this one neat trick (atomic PutIfAbsent)http://notes.eatonphil.com/2024-09-29-build-a-serverless-acid-database-with-this-one-neat-trick.html<p>Delta Lake is an open protocol for serverless ACID databases. Due to
its simplicity, scalability, and the number of open-source
implementations, it's quickly becoming the DuckDB of serverless
transactional databases for analytics workloads. Iceberg is a
contender too, and is similar in many ways. But since Delta Lake is
simpler (simple != better) that's where we'll focus in this post.</p>
<p>Delta Lake has one of the most accessible database papers I've read
(<a href="https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf">link</a>). It's
kind of like the
<a href="https://github.com/xoreaxeaxeax/movfuscator">movfuscator</a> of
databases.</p>
<p>Thanks to its simplicity, in this post we'll implement a Delta
Lake-inspired serverless ACID database in 500 lines of Go code with
zero dependencies. It will support creating tables, inserting rows
into a table, and scanning all rows in a table. All while allowing
concurrent readers and writers and achieving <a href="https://jepsen.io/consistency">snapshot
isolation</a>.</p>
<p>There are other critical parts of Delta Lake we'll ignore: updating
rows, deleting rows, checkpointing the transaction metadata log,
compaction, and probably much more I'm not aware of. We must start
somewhere.</p>
<p>All code for this post is <a href="https://github.com/eatonphil/otf">available on GitHub</a>.</p>
<h3 id="delta-lake-basics">Delta Lake basics</h3><p>Delta Lake writes immutable data files to blob storage. It stores the
names of new data files for a transaction in a metadata file. It
handles concurrency (i.e. achieves snapshot isolation) with an atomic
PutIfAbsent operation on the metadata file for the transaction.</p>
<p>This method of concurrency control works because the metadata files
follow a naming scheme that includes the transaction id in the file
name. When a new transaction starts, it finds all existing metadata
files and picks its own transaction id by adding 1 to the largest
transaction id it sees.</p>
<p>When a transaction goes to commit, writing the metadata file will
fail if another transaction has already picked the same transaction
id.</p>
<p>If a transaction does no writes and creates no tables, the transaction
does not attempt to write any metadata file. Snapshot isolation!</p>
<p>Let's dig into the implementation.</p>
<h3 id="boilerplate">Boilerplate</h3><p>Let's give ourselves some nice assertion methods, a debug method, and
a uuid generator. In <code>main.go</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"path"</span>
<span class="w"> </span><span class="s">"slices"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">assertEq</span><span class="p">[</span><span class="nx">C</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">a</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%s '%v' != '%v'"</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span><span class="w"> </span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">slices</span><span class="p">.</span><span class="nx">Contains</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">,</span><span class="w"> </span><span class="s">"--debug"</span><span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">append</span><span class="p">([]</span><span class="kt">any</span><span class="p">{</span><span class="s">"[DEBUG]"</span><span class="p">},</span><span class="w"> </span><span class="nx">a</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">args</span><span class="o">...</span><span class="p">)</span>
<span class="p">}</span>
<span class="c1">// https://datatracker.ietf.org/doc/html/rfc4122#section-4.4</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">uuidv4</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">"/dev/random"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"could not open /dev/random: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">))</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">)</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"could not read 16 bytes from /dev/random: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">))</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">buf</span><span class="p">),</span><span class="w"> </span><span class="s">"expected 16 bytes from /dev/random"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Set bit 6 to 0</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="w"> </span><span class="o">&=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">6</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Set bit 7 to 1</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">7</span>
<span class="w"> </span><span class="c1">// Set version</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">&=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">&=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">6</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">&=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%x-%x-%x-%x-%x"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[:</span><span class="mi">4</span><span class="p">],</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">4</span><span class="p">:</span><span class="mi">6</span><span class="p">],</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">:</span><span class="mi">8</span><span class="p">],</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">10</span><span class="p">],</span>
<span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">10</span><span class="p">:</span><span class="mi">16</span><span class="p">])</span>
<span class="p">}</span>
</pre></div>
<p>Is that uuid method correct? Hopefully. Efficient? No. But it's
preferable to avoid dependencies in pedagogical projects.</p>
<p>Moving on.</p>
<h3 id="blob-storage-requirements">Blob storage requirements</h3><p>As mentioned above, the basic requirement is that we support
atomically writing some bytes to a location if the location doesn't
already exist.</p>
<p>On top of that we also need the ability to list locations by prefix,
and the ability to read the bytes at some location.</p>
<p class="note">
We'll diverge from Delta Lake in how we name files on disk. For one,
we'll keep all files in the same directory with a fixed prefix for
metadata and another table name prefix for each data file. This
simplifies the implementation of <code>listPrefix</code> a bit.
<br />
<br />
However, this also diverges from Delta Lake in that transactions
will represent all tables. In Delta Lake that is not so. Delta Lake
has a per-table transaction log. Only transactions that read and
write the same table in Delta Lake achieve snapshot isolation.
</p><p>So let's set up an interface to describe these requirements:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">objectStorage</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Must be atomic.</span>
<span class="w"> </span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">listPrefix</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span>
<span class="w"> </span><span class="nx">read</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>And this is literally all we need to get ACID transactions. That's crazy!</p>
<h4 id="atomic-put-and-cloud-blob-storage">Atomic Put and cloud blob storage</h4><p>We could implement the atomic <code>putIfAbsent</code> part of this interface in
2024 using <a href="https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/">conditional
writes</a>
on S3. Or we could implement this interface with the <code>If-None-Match</code>
<a href="https://learn.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations">header</a>
on Azure Cloud Storage. Or we could implement this interface with the
<code>x-goog-if-generation-match</code>
<a href="https://cloud.google.com/storage/docs/xml-api/put-object">header</a> on
Google Cloud Storage.</p>
<p>Indeed a good exercise for the reader would be to implement this
interface for other blob storage providers and see your serverless
cloud database in action!</p>
<p>But the simplest method of all is to implement it on the filesystem,
which is what we'll do next.</p>
<h3 id="a-filesystem-blob-store">A filesystem blob store</h3><p>If we had a server we could implement atomic <code>putIfAbsent</code> with a
mutex. But we're serverless baby. Thankfully, POSIX <a href="https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html">supports atomic
link</a>
which will fail if the new name is already a file.</p>
<p>So we'll just create a temporary file and write out all
bytes. Finally, we link the temporary file to the permanent name we
intended. For cleanliness (not correctness), if there is an error at
any point, we'll remove the temporary file.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">fileObjectStorage</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">basedir</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newFileObjectStorage</span><span class="p">(</span><span class="nx">basedir</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">fileObjectStorage</span><span class="p">{</span><span class="nx">basedir</span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fos</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tmpfilename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">,</span><span class="w"> </span><span class="nx">uuidv4</span><span class="p">())</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_WRONLY</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="p">,</span><span class="w"> </span><span class="mo">0644</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nx">bufSize</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">16</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bytes</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">toWrite</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">written</span><span class="o">+</span><span class="nx">bufSize</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bytes</span><span class="p">))</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">bytes</span><span class="p">[</span><span class="nx">written</span><span class="p">:</span><span class="nx">toWrite</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not remove"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Sync</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not remove"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not remove"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Link</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not remove"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p class="note">
<a
href="https://news.ycombinator.com/item?id=41702593">yencabulator</a>
on HN pointed out that an earlier version of this post had a buggy
implementation of <code>putIfAbsent</code> (that attempted to manage
atomicity solely via <code>O_EXCL | O_CREAT</code>) would leave
around potentially bad metadata files if the <code>os.Remove</code>
call ever failed.
<br />
<br />
The <code>link</code> approach works around that because the file is
already fully and correctly written by the time we do the link.
</p><p><code>listPrefix</code> and <code>read</code> are minimal wrappers around filesystem APIs:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fos</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">listPrefix</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dir</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">)</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">files</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">names</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">names</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Readdirnames</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">names</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">HasPrefix</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">files</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">files</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">files</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fos</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">read</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">filename</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>It is worth talking a bit about reading a directory though. Go doesn't
provide a nice iterator API for us and I didn't want to implement this
as callbacks with
<a href="https://pkg.go.dev/path/filepath#WalkDir"><code>path/filepath.WalkDir</code></a>.</p>
<p>We could use <a href="https://pkg.go.dev/os#File.ReadDir"><code>os.File.ReadDir</code></a>
but it allocates for all files in the directory. Sure, in a
pedagogical project we don't worry about millions of files. But the
<code>ReadDir</code> API, the error cases in particular, also isn't much simpler
than <a href="https://pkg.go.dev/os#File.Readdirnames"><code>Readdirnames</code></a>.</p>
<p class="note">
What's more, even though we iterated through batches of directory
entries, and did prefix filtering before accumulating, we still could
have considered returning an iterator here ourselves. It seems
possible and likely that the number of data files grows quite large in
a production system. But I was lazy.
</p><p>It would be nice if Go introduced an actual iterator API for
reading a directory. :)</p>
<h4 id="delta-lake-and-stale-reads">Delta Lake and stale reads</h4><p>In any case the ACID properties of Delta Lake (and Iceberg) don't
depend on being able to read up-to-date data.</p>
<p>This is because concurrent (or stale) transactions that <em>write</em> will
<em>fail on commit</em>. And also because all files written (even metadata
files) are immutable.</p>
<p>Since all data is immutable, we will always be able to read at least a
consistent snapshot of data. But we will never be able to get
SERIALIZABLE <strong>read-only</strong> transactions. This is just how Delta Lake
and Iceberg work. And it is a <a href="https://jepsen.io/consistency">similar</a>
or better consistency level to what any major SQL database <a href="https://github.com/ept/hermitage">gives you
by default</a>.</p>
<p>You'll see what I mean later on when we implement transaction commits.</p>
<h3 id="transaction-boilerplate">Transaction boilerplate</h3><p>Now that we've got a blob storage abstraction and a filesystem
implementation of it, let's start sketching out what a client and what
a transaction looks like.</p>
<p>In Delta Lake, a transaction consists of a list of actions. An action
might be to define a table's schema, or to add a data file, or to
remove a data file, etc. In this post we'll only implement the first
two actions.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">DataobjectAction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">Table</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ChangeMetadataAction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Table</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">Columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="p">}</span>
<span class="c1">// an enum, only one field will be non-nil</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Action</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">AddDataobject</span><span class="w"> </span><span class="o">*</span><span class="nx">DataobjectAction</span>
<span class="w"> </span><span class="nx">ChangeMetadata</span><span class="w"> </span><span class="o">*</span><span class="nx">ChangeMetadataAction</span>
<span class="w"> </span><span class="c1">// TODO: Support object removal.</span>
<span class="w"> </span><span class="c1">// DeleteDataobject *DataobjectAction</span>
<span class="p">}</span>
</pre></div>
<p>These fields are all exported (i.e. capitalized, if you're not
familiar with Go) because we will be writing them to disk when the
transaction commits as the transaction's metadata.</p>
<p>In fact <code>Action</code>s and the transaction's id will be the only parts of
the transaction we write to disk. Everything else will be in-memory
state.</p>
<p>For our convenience we will track in memory a history of all previous
actions, a mapping of table columns, and a mapping of unflushed data
by table.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">transaction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Id</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="c1">// Both are mapping table name to a list of actions on the table.</span>
<span class="w"> </span><span class="nx">previousActions</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span>
<span class="w"> </span><span class="nx">Actions</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span>
<span class="w"> </span><span class="c1">// Mapping tables to column names.</span>
<span class="w"> </span><span class="nx">tables</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="kt">string</span>
<span class="w"> </span><span class="c1">// Mapping table name to unflushed/in-memory rows. When rows</span>
<span class="w"> </span><span class="c1">// are flushed, the dataobject that contains them is added to</span>
<span class="w"> </span><span class="c1">// `tx.actions` above and `tx.unflushedDataPointer[table]` is</span>
<span class="w"> </span><span class="c1">// reset to `0`.</span>
<span class="w"> </span><span class="nx">unflushedData</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">unflushedDataPointer</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span>
<span class="p">}</span>
</pre></div>
<p>Only the current <code>transaction</code> will ever have
<code>transaction.previousActions</code> filled out. <code>transaction.tables</code> will be
populated when the transaction starts by reading through
<code>transaction.previousActions</code> for <code>ChangeMetadataAction</code>s, and we will
also add onto it when we create a table in the current transaction.</p>
<p>We will append to <code>transaction.Actions</code> every time we write a new data
file and every time we create a new table.</p>
<p>We will add rows to <code>transaction.unflushedData</code> for a table until
<code>transaction.unflushedDataPointer</code> for that table reaches
<code>DATAOBJECT_SIZE</code> upon which time we will write that data to disk and
add a <code>DataobjectAction</code> entry to <code>transaction.Actions</code>.</p>
<h3 id="client-boilerplate">Client boilerplate</h3><p>A <code>client</code> will consist of an <code>objectStorage</code> implementation and a
possibly empty <code>*transaction</code>. Empty meaning there is no current
transaction.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">client</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">os</span><span class="w"> </span><span class="nx">objectStorage</span>
<span class="w"> </span><span class="c1">// Current transaction, if any. Only one transaction per</span>
<span class="w"> </span><span class="c1">// client at a time. All reads and writes must be within a</span>
<span class="w"> </span><span class="c1">// transaction.</span>
<span class="w"> </span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">transaction</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">os</span><span class="w"> </span><span class="nx">objectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">client</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">client</span><span class="p">{</span><span class="nx">os</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">errExistingTx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Existing Transaction"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">errNoTx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"No Transaction"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">errTableExists</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Table Exists"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">errNoTable</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"No Such Table"</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
<h4 id="client-or-database?">Client or database?</h4><p>In a previous version of my code I named this <code>client</code> struct
<code>database</code>. But that's misleading. There is no central database. There
is just the client and the blob storage.</p>
<p>Clients work with transactions directly and only when attempting to
commit does the blob storage abstraction let the client know if the
transaction succeeded or not.</p>
<h3 id="starting-a-transaction">Starting a transaction</h3><p>When we start a transaction, we will first read all existing
transactions from disk and accumulate the actions from each prior
transaction.</p>
<p>We will interpret <code>ChangeMetadataAction</code>s and materialize them into a
current view of all tables.</p>
<p>And we will assign a transaction ID to this transaction to be 1
greater than the largest existing transaction ID we see.</p>
<p>Again it doesn't matter if the <code>listPrefix</code> call we use returns an
up-to-date list. Notably on blob storage there are few guarantees
about LIST operations recency. The Delta Lake paper mentions this too.</p>
<p>Out-of-date transactions attempting to write will be caught when we go
to commit the transaction. Out-of-date transactions attempting only to
read will still read a consistent snapshot.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">newTx</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errExistingTx</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">logPrefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"_log_"</span>
<span class="w"> </span><span class="nx">txLogFilenames</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">listPrefix</span><span class="p">(</span><span class="nx">logPrefix</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">tx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">transaction</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="kt">string</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">txLogFilename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">txLogFilenames</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">txLogFilename</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">oldTx</span><span class="w"> </span><span class="nx">transaction</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">oldTx</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Transaction metadata files are sorted</span>
<span class="w"> </span><span class="c1">// lexicographically so that the most recent</span>
<span class="w"> </span><span class="c1">// transaction (i.e. the one with the largest</span>
<span class="w"> </span><span class="c1">// transaction id) will be last and tx.Id will end up</span>
<span class="w"> </span><span class="c1">// 1 greater than the most recent transaction ID we</span>
<span class="w"> </span><span class="c1">// see on disk.</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">oldTx</span><span class="p">.</span><span class="nx">Id</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">actions</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">oldTx</span><span class="p">.</span><span class="nx">Actions</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">actions</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">AddDataobject</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">action</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">ChangeMetadata</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Store the latest version of</span>
<span class="w"> </span><span class="c1">// each table in memory for</span>
<span class="w"> </span><span class="c1">// easy lookup.</span>
<span class="w"> </span><span class="nx">mtd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">ChangeMetadata</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mtd</span><span class="p">.</span><span class="nx">Columns</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"unsupported action: %v"</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tx</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And we're set.</p>
<h3 id="creating-a-table">Creating a table</h3><p>When we create a table, we need to add a <code>ChangeMetadataAction</code> to the
transactions <code>Actions</code>. And we also want to add the table info to the
in-memory <code>transaction.tables</code> field.</p>
<p>We don't do any of this durably. The change here will be written to
disk on commit (if the transaction succeeds).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">createTable</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">exists</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">];</span><span class="w"> </span><span class="nx">exists</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errTableExists</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Store it in the in-memory mapping.</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">columns</span>
<span class="w"> </span><span class="c1">// And also add it to the action history for future transactions.</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">Action</span><span class="p">{</span>
<span class="w"> </span><span class="nx">ChangeMetadata</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">ChangeMetadataAction</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Easy peasy. Now for the fun part, writing data!</p>
<h3 id="writing-a-row">Writing a row</h3><p>This is the next area where we'll diverge from Delta Lake. For the
sake of zero dependencies we are going to store data in-memory as an
array of array of <code>any</code>. And when we later write rows to disk we'll
write them as JSON. A real Delta Lake implementation would store data
in-memory in Apache Arrow format, and write to disk as Parquet.</p>
<p>In line with Delta Lake though we will buffer data in memory until we
get 64K rows. When we get 64K rows for a particular table we will
flush all those rows to disk. (When we go to commit a transaction we
will flush any outstanding rows.)</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">writeRow</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">];</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTable</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Try to find an unflushed/in-memory dataobject for this table</span>
<span class="w"> </span><span class="nx">pointer</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span><span class="p">{}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">DATAOBJECT_SIZE</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">flushRows</span><span class="p">(</span><span class="nx">table</span><span class="p">)</span>
<span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">][</span><span class="nx">pointer</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">row</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="o">++</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Now let's implement flushing.</p>
<h3 id="flushing-a-data-object">Flushing a data object</h3><p>Recall that data objects in Delta Lake (and Iceberg) are
immutable. Once we've got enough data to write a data object, we give
it a unique name, write it to disk, and add a <code>AddObjectAction</code> to the
transaction's list of <code>Actions</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">dataobject</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Table</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">Data</span><span class="w"> </span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">Len</span><span class="w"> </span><span class="kt">int</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">flushRows</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// First write out dataobject if there is anything to write out.</span>
<span class="w"> </span><span class="nx">pointer</span><span class="p">,</span><span class="w"> </span><span class="nx">exists</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">exists</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">df</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dataobject</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">uuidv4</span><span class="p">(),</span>
<span class="w"> </span><span class="nx">Data</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span>
<span class="w"> </span><span class="nx">Len</span><span class="p">:</span><span class="w"> </span><span class="nx">pointer</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">df</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"_table_%s_%s"</span><span class="p">,</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">df</span><span class="p">.</span><span class="nx">Name</span><span class="p">),</span><span class="w"> </span><span class="nx">bytes</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Then record the newly written data file.</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">Action</span><span class="p">{</span>
<span class="w"> </span><span class="nx">AddDataobject</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">DataobjectAction</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">df</span><span class="p">.</span><span class="nx">Name</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="c1">// Reset in-memory pointer.</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>That's it for writing data! Let's now look at reading data.</p>
<h3 id="scanning-a-table">Scanning a table</h3><p>We're going to make scanning mildly more complicated than it needed to
be in pedagogical code because we'll have <code>client.scan()</code> return an
iterator rather than an array with all rows.</p>
<p>The <code>scanIterator</code> will first read from in-memory (unflushed)
data. And then it will read through every data object for the table
that is still a part of this transaction. We will know which data
objects are still a part of this transaction by reading through all
<code>AddDataobject</code> actions. A future version of this project would also
eliminate data object files from the list by observing
<code>DeleteDataobject</code> actions. But we don't do that in this post.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">scan</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">scanIterator</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errNoTx</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dataobjects</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">allActions</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">allActions</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">AddDataobject</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dataobjects</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">dataobjects</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">AddDataobject</span><span class="p">.</span><span class="nx">Name</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">unflushedRows</span><span class="w"> </span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">unflushedRows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">data</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">scanIterator</span><span class="p">{</span>
<span class="w"> </span><span class="nx">unflushedRows</span><span class="p">:</span><span class="w"> </span><span class="nx">unflushedRows</span><span class="p">,</span>
<span class="w"> </span><span class="nx">unflushedRowsLen</span><span class="p">:</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span>
<span class="w"> </span><span class="nx">d</span><span class="p">:</span><span class="w"> </span><span class="nx">d</span><span class="p">,</span>
<span class="w"> </span><span class="nx">table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span>
<span class="w"> </span><span class="nx">dataobjects</span><span class="p">:</span><span class="w"> </span><span class="nx">dataobjects</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>The <code>scanIterator</code> needs to track where we are in in-memory rows, in
data objects, and within a particular data object.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">scanIterator</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span>
<span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="c1">// First we iterate through unflushed rows.</span>
<span class="w"> </span><span class="nx">unflushedRows</span><span class="w"> </span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">unflushedRowsLen</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="nx">unflushedRowPointer</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="c1">// Then we move through each dataobject.</span>
<span class="w"> </span><span class="nx">dataobjects</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">dataobjectsPointer</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="c1">// And within each dataobject we iterate through rows.</span>
<span class="w"> </span><span class="nx">dataobject</span><span class="w"> </span><span class="o">*</span><span class="nx">dataobject</span>
<span class="w"> </span><span class="nx">dataobjectRowPointer</span><span class="w"> </span><span class="kt">int</span>
<span class="p">}</span>
</pre></div>
<p>And the <code>scanIterator</code> will be driven by a <code>next()</code> method that goes
through in-memory data first and then through what's on disk.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">readDataobject</span><span class="p">(</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">dataobject</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"_table_%s_%s"</span><span class="p">,</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">do</span><span class="w"> </span><span class="nx">dataobject</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">do</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">do</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
<span class="c1">// returns (nil, nil) when done</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">si</span><span class="w"> </span><span class="o">*</span><span class="nx">scanIterator</span><span class="p">)</span><span class="w"> </span><span class="nx">next</span><span class="p">()</span><span class="w"> </span><span class="p">([]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Iterate through in-memory rows first.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowPointer</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowsLen</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRows</span><span class="p">[</span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowPointer</span><span class="p">]</span>
<span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowPointer</span><span class="o">++</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// If we've gotten through all dataobjects on disk we're done.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectsPointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjects</span><span class="p">[</span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectsPointer</span><span class="p">]</span>
<span class="w"> </span><span class="nx">o</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">d</span><span class="p">.</span><span class="nx">readDataobject</span><span class="p">(</span><span class="nx">si</span><span class="p">.</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">o</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="p">.</span><span class="nx">Len</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectsPointer</span><span class="o">++</span>
<span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">next</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="p">.</span><span class="nx">Data</span><span class="p">[</span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="p">]</span>
<span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="o">++</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>That's it for scanning a table! The final piece of the puzzle is
committing a transaction.</p>
<h3 id="committing-a-transaction">Committing a transaction</h3><p>When we commit a transaction we must flush any remaining data. A
read-only transaction (one which has no <code>Actions</code>) is immediately
done. There is no concurrency check.</p>
<p>Otherwise we will serialize transaction state and attempt to
atomically <code>putIfAbsent</code>.</p>
<p>The only way this will fail is if there is another concurrent writer.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">commitTx</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Flush any outstanding data</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">flushRows</span><span class="p">(</span><span class="nx">table</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">wrote</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">actions</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">actions</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">wrote</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Read-only transaction, no need to do a concurrency check.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">wrote</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"_log_%020d"</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Id</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// We won't store previous actions, they will be recovered on</span>
<span class="w"> </span><span class="c1">// new transactions. So unset them. Honestly not totally</span>
<span class="w"> </span><span class="c1">// clear why.</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="p">)</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"unimplemented"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>This is the crux of Delta Lake. It's simple. And honestly it's a bit
shocking. Real Delta Lake does support automatic retries in some
cases. But primarily you are limited to a single writer per table,
even if the writers are writing non-conflicting rows. Iceberg is
basically the same here, it's just how metadata is tracked that
differs.</p>
<p class="note">
As mentioned in another note above, our implementation is actually
stricter than Delta Lake since it manages all table transaction logs
together. This means you can get snapshot isolation across all
tables (which Delta Lake doesn't support) but it will mean
significantly more contention and failed write transactions.
</p><p>The Delta Lake and Iceberg folks apparently wanted to avoid
FoundationDB (i.e. the Snowflake architecture, which is mentioned in
the Delta Lake paper) so much that they'd give up row-level
concurrency to be mostly serverless.</p>
<p>Is it worth it? Dunno. Delta Lake and Iceberg are getting massive
adoption. Many very smart people have worked, and continue to work, on
both. Moreover it is apparently what the market wants. Every
database-like product is implementing, or is planning to implement,
Delta Lake or Iceberg.</p>
<h3 id="trying-it-out">Trying it out</h3><p>Let's add a test in <code>main_test.go</code> to see what happens with concurrent
writers. Follow the comments and debug logs for details:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"testing"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">TestConcurrentTableWriters</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirTemp</span><span class="p">(</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"test-database"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fos</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newFileObjectStorage</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1Writer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2Writer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Have c2Writer start up a transaction.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not start first c2 tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2] new tx"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But then have c1Writer start a transaction and commit it first.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not start first c1 tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1] new tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">createTable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="s">"b"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not create x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1] Created table"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">"Joey"</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not write first row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1] Wrote row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">"Yue"</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not write second row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1] Wrote row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not commit tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1] Committed tx"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Now go back to c2 and write data.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">createTable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="s">"b"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not create x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2] Created table"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">"Holly"</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not write first row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2] Wrote row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"concurrent commit must fail"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2] tx not committed"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Try it out:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>otf
<span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
<span class="gp">$ </span>go<span class="w"> </span><span class="nb">test</span><span class="w"> </span>-run<span class="w"> </span>TestConcurrentTableWriters<span class="w"> </span>--<span class="w"> </span>--debug
<span class="go">[DEBUG] [c2] new tx</span>
<span class="go">[DEBUG] [c1] new tx</span>
<span class="go">[DEBUG] [c1] Created table</span>
<span class="go">[DEBUG] [c1] Wrote row</span>
<span class="go">[DEBUG] [c1] Wrote row</span>
<span class="go">[DEBUG] [c1] Committed tx</span>
<span class="go">[DEBUG] [c2] Created table</span>
<span class="go">[DEBUG] [c2] Wrote row</span>
<span class="go">[DEBUG] [c2] tx not committed</span>
<span class="go">PASS</span>
<span class="go">ok otf 0.311s</span>
</pre></div>
<p>That's pretty cool.</p>
<p>And what about a reader and concurrent writer? Observe that the reader
always reads a snapshot. Follow the comments again for detail:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestConcurrentReaderWithWriterReadsSnapshot</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirTemp</span><span class="p">(</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"test-database"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fos</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newFileObjectStorage</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1Writer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2Reader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// First create some data and commit the transaction.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not start first c1 tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Started tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">createTable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"a"</span><span class="p">,</span><span class="w"> </span><span class="s">"b"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not create x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Created table"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">"Joey"</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not write first row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Wrote row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">"Yue"</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not write second row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Wrote row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not commit tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Committed tx"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Now start a new transaction for more edits.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not start second c1 tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Starting new write tx"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Before we commit this second write-transaction, start a</span>
<span class="w"> </span><span class="c1">// read transaction.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Reader</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not start c2 tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2Reader] Started tx"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Write and commit rows in c1.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">"Ada"</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not write third row"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Wrote third row"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Scan x in read-only transaction</span>
<span class="w"> </span><span class="nx">it</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2Reader</span><span class="p">.</span><span class="nx">scan</span><span class="p">(</span><span class="s">"x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not scan x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2Reader] Started scanning"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">it</span><span class="p">.</span><span class="nx">next</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not iterate x scan"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2Reader] Done scanning"</span><span class="p">)</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2Reader] Got row in reader tx"</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">"Joey"</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">1.0</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">"Yue"</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">2.0</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">seen</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">seen</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s">"expected two rows"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Scan x in c1 write transaction</span>
<span class="w"> </span><span class="nx">it</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">scan</span><span class="p">(</span><span class="s">"x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not scan x in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Started scanning"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">it</span><span class="p">.</span><span class="nx">next</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not iterate x scan in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Done scanning"</span><span class="p">)</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Got row in tx"</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">"Ada"</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Since this hasn't been serialized to JSON, it's still an int not a float.</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">"Joey"</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">1.0</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">"Yue"</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">2.0</span><span class="p">,</span><span class="w"> </span><span class="s">"row mismatch in c1"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">seen</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">seen</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s">"expected three rows"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Writer committing should succeed.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not commit second tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c1Writer] Committed tx"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Reader committing should succeed.</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Reader</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"could not commit read-only tx"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"[c2Reader] Committed tx"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Run it:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>go<span class="w"> </span><span class="nb">test</span><span class="w"> </span>-run<span class="w"> </span>TestConcurrentReaderWithWriterReadsSnapshot<span class="w"> </span>--<span class="w"> </span>--debug
<span class="go">[DEBUG] [c1Writer] Started tx</span>
<span class="go">[DEBUG] [c1Writer] Created table</span>
<span class="go">[DEBUG] [c1Writer] Wrote row</span>
<span class="go">[DEBUG] [c1Writer] Wrote row</span>
<span class="go">[DEBUG] [c1Writer] Committed tx</span>
<span class="go">[DEBUG] [c1Writer] Starting new write tx</span>
<span class="go">[DEBUG] [c2Reader] Started tx</span>
<span class="go">[DEBUG] [c1Writer] Wrote third row</span>
<span class="go">[DEBUG] [c2Reader] Started scanning</span>
<span class="go">[DEBUG] [c2Reader] Got row in reader tx [Joey 1]</span>
<span class="go">[DEBUG] [c2Reader] Got row in reader tx [Yue 2]</span>
<span class="go">[DEBUG] [c2Reader] Done scanning</span>
<span class="go">[DEBUG] [c1Writer] Started scanning</span>
<span class="go">[DEBUG] [c1Writer] Got row in tx [Ada 3]</span>
<span class="go">[DEBUG] [c1Writer] Got row in tx [Joey 1]</span>
<span class="go">[DEBUG] [c1Writer] Got row in tx [Yue 2]</span>
<span class="go">[DEBUG] [c1Writer] Done scanning</span>
<span class="go">[DEBUG] [c1Writer] Committed tx</span>
<span class="go">[DEBUG] [c2Reader] Committed tx</span>
<span class="go">PASS</span>
<span class="go">ok otf 0.252s</span>
</pre></div>
<p>Sweet.</p>
<h3 id="what's-next?">What's next?</h3><p>As mentioned, we didn't touch a lot of things. Handling updates and
deletes, transaction log checkpoints, data object compaction, etc.</p>
<p>Take a close look at the <a href="https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf">Delta Lake
paper</a> and the
<a href="https://github.com/delta-io/delta/blob/master/PROTOCOL.md">Delta Lake
Spec</a> and
see what you can do!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Build a serverless ACID database with this one neat trick.<br><br>(New blog post)<a href="https://t.co/rHgfKSPY6q">https://t.co/rHgfKSPY6q</a> <a href="https://t.co/1hmjsxIk6w">pic.twitter.com/1hmjsxIk6w</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1840474893491560777?ref_src=twsrc%5Etfw">September 29, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-09-29-build-a-serverless-acid-database-with-this-one-neat-trick.htmlSun, 29 Sep 2024 00:00:00 +0000
- Be someone who does thingshttp://notes.eatonphil.com/2024-09-23-be-someone-who-does-things.html<p>I <a href="https://notes.eatonphil.com/2024-08-24-obsession.html">wrote last
month</a> that
<em>what you want to do</em> is one of the most useful motivations in life. I
want to follow that up by saying that the only thing more important
than wanting to do something is to <em>actually</em> do something.</p>
<p>The most valuable trait you can develop for yourself is to be
consistent. It is absolutely something you can develop. And moreover
it's kind of hard to believe that for anyone it is innate.</p>
<p>I meet so many people who say they want to do things. And I ask them
what they're doing to get there and they get flustered. This is
completely understandable.</p>
<p>I meet so many students who feel overwhelmed by what everyone else is
doing. This is also understandable.</p>
<p>But it doesn't matter what anyone else is doing. It doesn't matter
where anyone else is at. It matters where you are at. Compete with
yourself before you compete with anyone else. What matters is that you
get into a habit of consistently working on little goals.</p>
<p>If you pick something that is too complex, break it down. Keep on
breaking problems or ideas down until you find a problem or idea you
can solve.</p>
<p>Then keep on finding new problems to solve. Move on in complexity over
time as you can and want to.</p>
<p>Don't worry about getting things perfect. Who can discredit you for
doing your best? What shame is there when you're being earnest? The
only thing that makes sense to feel bad about is not <em>trying to do</em>
what you <em>genuinely wanted to do</em>.</p>
<p>And this doesn't have to be about projects or ideas outside of
work. There may be things you want to do at work like improving
documentation or writing better tests or adding new checks to code or
blogging or interviewing customers or working with another team.</p>
<p>Like I said in
<a href="https://notes.eatonphil.com/2024-08-24-obsession.html">Obsession</a>,
don't worry about what you do daily. That is too frequent to think
about. Instead think about what you're doing once a month.</p>
<p>Make time once a month to publish a post or complete a small
project. Whatever you want to do, I am confident you can find some
small version of it that you could commit to doing once a month. Be
consistent!</p>
<p>If a month is too often, pick a longer freqency. Find whatever cadence
and whatever size of project that allows you to be consistent.</p>
<p>When you're consistent over the course of months I think you'll be
astounded at what you accomplish in a year.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Shorter post tonight, may add to this later on.<br><br>Be someone who does things. And do these (little) things consistently.<a href="https://t.co/oVb6Sz8eEK">https://t.co/oVb6Sz8eEK</a> <a href="https://t.co/kNrZQ4pvTN">pic.twitter.com/kNrZQ4pvTN</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1838378171005128910?ref_src=twsrc%5Etfw">September 24, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-09-23-be-someone-who-does-things.htmlMon, 23 Sep 2024 00:00:00 +0000
- Obsessionhttp://notes.eatonphil.com/2024-08-24-obsession.html<p>In your professional and personal life, I don't believe there is a stronger motivation than having something in mind and the desire to do it. Yet the natural way to deal with a desire to do something is to justify why it's not possible.</p>
<p>"I want to read more books but nobody reads books these days so how could I."</p>
<p>"I want to write for a magazine but I have no experience writing professionally."</p>
<p>"I want to build a company someday but how could someone of my background."</p>
<p>Our official mentors, our managers, through a combination of well-intentioned defeatism and well-intentioned lack of accomplishment themselves, among other things, are often unable to process big goals or guide you toward them.</p>
<p>I've been one of these managers myself. In fact I have, to my immense regret, tried too often to convince people to do what is practical rather than what they want to do. Or to do what I judged they were capable of doing rather than what they wanted to do.</p>
<p>In the best cases, my listener had the self-confidence to ignore me. They did what they wanted to do anyway. In the worst case, again to my deep regret, I've been a well-intentioned part of derailing someone's career for years.</p>
<p>So I don't want to convince anyone of anything anymore. If I start trying to convince someone by accident, I try to catch myself. I try to avoid sentences like "I think you should …". Instead "Here is something that's worked for me: …" or "Here is what I've heard works well for other people: …".</p>
<p>Nobody wants to be convinced. But intelligent people will change their mind when exposed to new facts or different ideas. Being convinced is a battle of will. Changing one's mind is a purely personal decision.</p>
<p>There are certainly people with discipline who can grind on things they hate doing and eventually become experts at it. But more often I see people grind on things they hate only to become depressed and give up.</p>
<p>For most of us, our best hope is (healthy) obsession. And obsession, in the sense I'm talking about, does not come from something you are ambivalent about or hate. Obsession can only come when you're doing something you actually want to do.</p>
<p>For big goals or big changes, you need regular commitment weekly, monthly, yearly. Over the course of years. And only obsession makes that work not actually feel like work. Obsession is the only thing that makes discipline not feel like discipline.</p>
<p>That big goals take years to accomplish need not be scary. Obsession doesn't mean you can't pivot. There is quite a lot to gain by committing to something regularly over the course of years even if you decide to stop and commit from then on to something else. You will learn a good deal.</p>
<p>And healthy obsession to me is more specifically measurable on the order of weeks, not hours or days. Healthy obsession means you're still building healthy personal and professional relationships. You're still taking care of yourself, emotionally and physically.</p>
<p>I do not have high expectations for people in general. This seems healthy and reasonable. But as I meet more people and observe them over the years, I am only more convinced of the vast potential of individuals. Individuals are almost universally underestimated.</p>
<p>I think you can do almost anything you want to do. If you commit to do doing it.</p>
<p>I'll end this with a personal story.</p>
<p>Until 11th grade, I hated school. I hated the rigidity. Being forced to be somewhere for hours and to follow so many rules. I skipped so many days of school I'm embarrassed by it. I'd never do homework at home. I never studied for tests. I got Bs and Cs in the second-tier classes. I was in the orchestra for 6 years and never practiced at home. I was not cool enough to be a "bad kid" but I did not understand the system and had no discipline whatsoever.</p>
<p>I found out at the end of 10th grade that I could actually afford college if I got into a good enough school that paid full needs-based tuition. It sounded significantly better than the only other option that seemed obvious, joining the military as a recruit. I realized and decided that if I wanted to get into a good school I needed to not half-ass things.</p>
<p>Somehow, I decided to only do things I could become obsessed with. And I decided to be obsessed in the way that I wanted, not to do what everyone else did (which I basically could not do since I had no discipline). If we covered a topic in class, I'd read news about it or watch movies about it. I'd get myself excited about the topic in every way I could.</p>
<p>It basically worked out. I ended high school in the top 10% of the class (up from top 40% or something). I got into a good liberal arts college that paid the entirety of my tuition. But I remained a basically lazy and undisciplined person. I never stayed up late studying for a test. I dropped out after a year and a half for family reasons.</p>
<p>But I've now spent the last 10 years in my spare time working on compiler projects, interpreter projects, parser projects, database projects, distributed systems projects. I've spent the last 6 years consistently publishing at least one blog post per month.</p>
<p>I didn't want to work the way everyone else worked. I wanted to be obsessed about what I worked on.</p>
<p>Obsession has made all of this into something I now barely register as doing. It's allowed me to continue adding activities like organizing book clubs and meetups to the list of things I'm up to. Up until basically this year I could have in good faith said I am a very lazy and undisciplined person. But obsession turned me into someone with discipline.</p>
<p>Obsession became about more than just the tech. It meant trying to fully understand the product, the users, the market. It meant thinking more carefully about product documentation, user interfaces, company messaging. Obsession meant reflecting on how I treat my coworkers, and how my coworkers feel treated by others in general. Obsession meant wanting an equitable and encouraging work environment for everyone.</p>
<p>And, as I said, it's about healthy obsession. I didn't really understand the "healthy" part until a few years ago. But I'm now convinced that the "healthy" part is as important as the "obsession" part. To go to the gym regularly. To play pickup volleyball. To cook excellent food. To read fiction and poetry and play music. To serve the community. To be friendly and encouraging to all people. To meet new people and build better genuine friendships.</p>
<p>And in the context of work, "healthy obsession" means understanding you can't do everything, even while you care about everything. It means accepting that you make mistakes and that you do your best; that you try to do better and learn from mistakes the next time.</p>
<p>It's got to be sustainable. And we can develop a healthy obsession while we have quite a bit of fun too. :)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote an essay on my mistakes trying to convince people to do something, on doing what you want to do, and on obsession.<br><br>Ended with a personal note on developing healthy discipline, and having fun. :)<a href="https://t.co/4WWdtU6AhL">https://t.co/4WWdtU6AhL</a> <a href="https://t.co/lBw7zlqWeq">pic.twitter.com/lBw7zlqWeq</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1827373730781147241?ref_src=twsrc%5Etfw">August 24, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-08-24-obsession.htmlSat, 24 Aug 2024 00:00:00 +0000
- What's the big deal about Deterministic Simulation Testing?http://notes.eatonphil.com/2024-08-20-deterministic-simulation-testing.html<p>Bugs in distributed systems are hard to find, largely because systems
interact in chaotic ways. And even once you've found a bug, it can be
anywhere from simple to impossible to reproduce it. It's about as far
away as you can get from the ideal test environment: property testing
a pure function.</p>
<p>But what if we could write our code in a way that we can isolate the
chaotic aspects of our distributed system during <i>testing</i>: run
multiple systems communicating with each other on a <i>single
thread</i> and control all randomness in each system? And property
test this single-threaded version of the distributed system with
controlled randomness, all the while injecting faults (fancy term for
unhappy path behavior like errors and latency) we might see in the
real-world?</p>
<p>Crazy as it sounds, people actually do this. It's called Deterministic
Simulation Testing (DST). And it's become more and more popular with
startups like FoundationDB, Antithesis, TigerBeetle, Polar Signals,
and WarpStream; as well as folks like Tyler Neely and Pekka Enberg,
talking about and making use of this technique.</p>
<p>It has become so popular to talk about DST in my corner of the world
that I worry it risks coming off sounding too magical and maybe a
little hyped. It's worth getting a better understanding of both the
benefits and the limitations.</p>
<p>Thank you to <a href="https://www.linkedin.com/in/alexmillerdb/">Alex Miller</a>
and <a href="https://www.linkedin.com/in/will-wilson-330276112/">Will Wilson</a>
for reviewing a version of this post.</p>
<h3 id="randomness-and-time">Randomness and time</h3><p>A big source of non-determinism in business logic is the use of random
numbers—in your code or your transitive dependencies or your language
runtime or your operating system.</p>
<p>Crucially, DST does not imply you can't have randomness! DST merely
assumes that you have a global seed for all randomness in your program
and that the simulator controls the seed. The seed may change across
runs of the simulator.</p>
<p>Once you observe a bad state as a result of running the simulation on
a random seed, you allow the user to enter the same seed again. This
allows the user to recreate the entire program run that led to that
observed bad state. Allows the user to debug the program trivially.</p>
<p>Another big source of non-determinism is being dependent on time. As
with randomness, DST does not mean you can't depend on time. DST means
you must be able to control the clock during the simulation.</p>
<p>To "control" randomness or time basically means you support dependency
injection, or the old-school alternative to dependency injection
called <i>passing the dependency as an explicit parameter</i>. Rather
than referring to a global clock or a global seed, you need to be able
to receive a clock or a seed from someone.</p>
<p>For example we might separate the operation of an application into the
language's <code>main()</code> entrypoint and an actual application <code>start()</code>
entrypoint.</p>
<div class="highlight"><pre><span></span><span class="c1"># app.pseudocode</span>
<span class="k">def</span> <span class="nf">start</span><span class="p">(</span><span class="n">clock</span><span class="p">,</span> <span class="n">seed</span><span class="p">):</span>
<span class="c1"># lots of business logic that might depend on time or do random things</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">:</span>
<span class="n">clock</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">clock</span><span class="p">()</span>
<span class="n">seed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="n">app</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">clock</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span>
</pre></div>
<p>The application entrypoint is where we must be able to
swap out a real clock or real random seed for one controlled by our
simulator:</p>
<div class="highlight"><pre><span></span><span class="c1"># sim.pseudocode</span>
<span class="kn">import</span> <span class="s2">"app.pseudocode"</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">:</span>
<span class="n">sim_clock</span> <span class="o">=</span> <span class="n">make_sim_clock</span><span class="p">()</span>
<span class="n">sim_seed</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span> <span class="ow">or</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">app</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_clock</span><span class="p">,</span> <span class="n">sim_seed</span><span class="p">)</span>
<span class="n">catch</span><span class="p">(</span><span class="n">e</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Bad execution at seed: </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">sim_seed</span><span class="p">)</span>
<span class="n">throw</span> <span class="n">e</span>
</pre></div>
<p>Let's look at another example.</p>
<h3 id="converting-an-existing-function">Converting an existing function</h3><p>Let's say that we had a helper method that kept calling a function
until it succeeded, with backoff.</p>
<div class="highlight"><pre><span></span><span class="c1"># retry.pseudocode</span>
<span class="k">class</span> <span class="nc">Backoff</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">init</span><span class="p">:</span>
<span class="n">this</span><span class="o">.</span><span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">retry_backoff</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="k">while</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o"><</span> <span class="mi">3</span><span class="p">:</span>
<span class="k">if</span> <span class="n">f</span><span class="p">():</span>
<span class="k">return</span>
<span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">rnd</span><span class="o">.</span><span class="n">gen</span><span class="p">())</span>
<span class="n">this</span><span class="o">.</span><span class="n">tries</span><span class="o">++</span>
</pre></div>
<p>There is a single source of nondeterminism here and it's where we
generate a seed. We could parameterize the seed, but since we want to
call <code>time.sleep()</code> and since in DST we control the time, we can just
parameterize <code>time</code>.</p>
<div class="highlight"><pre><span></span><span class="c1"># retry.psuedocode</span>
<span class="k">class</span> <span class="nc">Backoff</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="n">this</span><span class="p">,</span> <span class="n">time</span><span class="p">):</span>
<span class="n">this</span><span class="o">.</span><span class="n">time</span> <span class="o">=</span> <span class="n">time</span>
<span class="n">this</span><span class="o">.</span><span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span> <span class="o">=</span> <span class="n">this</span><span class="o">.</span><span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">())</span>
<span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">retry_backoff</span><span class="p">(</span><span class="n">this</span><span class="p">,</span> <span class="n">f</span><span class="p">):</span>
<span class="k">while</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o"><</span> <span class="mi">3</span><span class="p">:</span>
<span class="k">if</span> <span class="n">f</span><span class="p">():</span>
<span class="k">return</span>
<span class="k">await</span> <span class="n">this</span><span class="o">.</span><span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">rnd</span><span class="o">.</span><span class="n">gen</span><span class="p">())</span>
<span class="n">this</span><span class="o">.</span><span class="n">tries</span><span class="o">++</span>
</pre></div>
<p>Now we can write a little simulator to test this:</p>
<div class="highlight"><pre><span></span><span class="c1"># sim.psuedocode</span>
<span class="kn">import</span> <span class="s2">"retry.pseudocode"</span>
<span class="n">sim_time</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">now</span><span class="p">:</span> <span class="mi">0</span>
<span class="n">sleep</span><span class="p">:</span> <span class="p">(</span><span class="n">ms</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">await</span> <span class="n">future</span><span class="o">.</span><span class="n">wait</span><span class="p">(</span><span class="n">ms</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">tick</span><span class="p">:</span> <span class="p">(</span><span class="n">ms</span><span class="p">)</span> <span class="o">=></span> <span class="n">now</span> <span class="o">+=</span> <span class="n">ms</span>
<span class="p">}</span>
<span class="n">backoff</span> <span class="o">=</span> <span class="n">Backoff</span><span class="p">(</span><span class="n">sim_time</span><span class="p">)</span>
<span class="k">while</span> <span class="n">true</span><span class="p">:</span>
<span class="n">failures</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">f</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">0.5</span><span class="p">:</span>
<span class="n">failures</span><span class="o">++</span>
<span class="k">return</span> <span class="n">false</span>
<span class="k">return</span> <span class="n">true</span>
<span class="p">}</span>
<span class="k">try</span><span class="p">:</span>
<span class="k">while</span> <span class="n">sim_time</span><span class="o">.</span><span class="n">now</span> <span class="o"><</span> <span class="mi">60</span><span class="nb">min</span><span class="p">:</span>
<span class="n">promise</span> <span class="o">=</span> <span class="n">backoff</span><span class="o">.</span><span class="n">retry_backoff</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">sim_time</span><span class="o">.</span><span class="n">tick</span><span class="p">(</span><span class="mi">1</span><span class="n">ms</span><span class="p">)</span>
<span class="k">if</span> <span class="n">promise</span><span class="o">.</span><span class="n">read</span><span class="p">():</span>
<span class="k">break</span>
<span class="n">assert_expect_failure_and_expected_time_elapse</span><span class="p">(</span><span class="n">sim_time</span><span class="p">,</span> <span class="n">failures</span><span class="p">)</span>
<span class="n">catch</span><span class="p">(</span><span class="n">e</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Found logical error with seed: </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span>
<span class="n">throw</span> <span class="n">e</span>
</pre></div>
<p>This demonstrates a few critical aspects of DST. First, the simulator
itself depends on randomness. But allows the user to provide a seed so
they can replay a simulation that discovers a bug. The controlled
randomness in the simulator is what lets us do property testing.</p>
<p>Second, the simulation workload must be written by the user. Even when
you've got a platform like Antithesis that gives you an environment
for DST, it's up to you to exercise the application.</p>
<p>Now let's get a little more complex.</p>
<h3 id="a-single-thread-and-asynchronous-io">A single thread and asynchronous IO</h3><p>The determinism of multiple threads can only be controlled at the
operating system or emulator or hypervisor layer. Realistically, that
would require third-party systems like Antithesis or
<a href="https://github.com/facebookexperimental/hermit">Hermit</a> (which, don't
get excited, is not actively developed and hasn't worked on any
interesting program of mine) or <a href="https://rr-project.org/">rr</a>.</p>
<p>These systems transparently transform multi-threaded code into single
threaded code. But also note that Hermit and rr have only limited
ability to do fault injection which, in addition to deterministic
execution, is a goal of ours. And you can't run them on a mac. And
<a href="https://github.com/rr-debugger/rr/issues/1373">can't</a>
<a href="https://github.com/facebookexperimental/hermit?tab=readme-ov-file#support">run</a>
them on ARM.</p>
<p>But we can, and would like, to write a simulator without writing a new
operating system or emulator or hypervisor, and without a third-party
system. So we must limit ourselves to writing code that can be
collapsed into a single thread. Significantly, since using blocking IO
would mean an entire class of concurrency bugs could not be discovered
while running the simulator in a single thread, we must limit
ourselves to asynchronous IO.</p>
<p>Single threaded and asynchronous IO. These are already two big limitations.</p>
<p>Some languages like Go are entirely built around transparent
multi-threading and blocking IO. Polar Signals
<a href="https://www.polarsignals.com/blog/posts/2024/05/28/mostly-dst-in-go">solved</a>
this for DST by compiling their application to WASM where it would run
on a single thread. But that wasn't enough. Even on a single thread,
the Go runtime intentionally schedules goroutines randomly. So Polar
Signals forked the Go runtime to control this randomness with an
environment variable. That's kind of crazy. Resonate took <a href="https://github.com/resonatehq/resonate/blob/268c588e302f13187309e4b37636d19595d42fa1/internal/kernel/scheduler/coroutine.go">another
approach</a>
that also looks cumbersome. I'm not going to attempt to describe
it. Go seems like a difficult choice of a language if you want to do
DST.</p>
<p>Like Go, Rust has no builtin async IO. The most mature async IO
library is tokio. The tokio folks attempted to provide a
tokio-compatible <a href="https://github.com/tokio-rs/simulator">simulator</a>
implementation with all sources of nondeterminism removed. From what I
can tell, they did not at any point fully
<a href="https://github.com/tokio-rs/tokio/issues/1845">succeed</a>. That repo
has now been replaced with a "this is very experimental" tokio-rs
project called <a href="https://github.com/tokio-rs/turmoil">turmoil</a> that
provides deterministic execution plus network fault injection. (But
not disk fault injection. More on that later.) It isn't surprising
that it is difficult to provide deterministic execution for an IO
library that was not designed for it. tokio is a large project with
many transitive dependencies. They must all be combed for
non-determinism.</p>
<p>On the other hand, Pekka has <a href="https://github.com/penberg/hiisi/blob/main/hiisi-server/src/io/generic.rs">already
demonstrated</a>
for us how we might build a simpler Rust async IO library that is
designed to be simulation tested. He modeled this on the TigerBeetle
design King and I
<a href="https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue">wrote</a>
about two years ago.</p>
<p>So let's sketch out a program that does buggy IO and let's look at how
we can apply DST to it.</p>
<div class="highlight"><pre><span></span><span class="c1"># readfile.pseudocode</span>
<span class="k">def</span> <span class="nf">read_file</span><span class="p">(</span><span class="n">io</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">into_buffer</span><span class="p">):</span>
<span class="n">f</span> <span class="o">=</span> <span class="k">await</span> <span class="n">io</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">name</span><span class="p">)</span>
<span class="n">read_buffer</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4096</span><span class="p">]</span><span class="n">u8</span><span class="p">{}</span>
<span class="k">while</span> <span class="n">true</span><span class="p">:</span>
<span class="n">err</span><span class="p">,</span> <span class="n">n_read</span> <span class="o">=</span> <span class="k">await</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="o">&</span><span class="n">read_buffer</span><span class="p">)</span>
<span class="k">if</span> <span class="n">err</span> <span class="o">==</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span><span class="p">:</span>
<span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">sizeof</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">)])</span>
<span class="k">return</span>
<span class="k">if</span> <span class="n">err</span><span class="p">:</span>
<span class="n">throw</span> <span class="n">err</span>
<span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">sizeof</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">)])</span>
</pre></div>
<p>In our simulator, we will provide a mocked out IO system and we will
randomly inject various errors while asserting pre- and
post-conditions.</p>
<div class="highlight"><pre><span></span><span class="c1"># sim.psuedocode</span>
<span class="kn">import</span> <span class="s2">"readfile.pseudocode"</span>
<span class="n">seed</span> <span class="o">=</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span> <span class="err">?</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span><span class="p">)</span> <span class="p">:</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
<span class="k">while</span> <span class="n">true</span><span class="p">:</span>
<span class="n">sim_disk_data</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_bytes</span><span class="p">(</span><span class="mi">10</span><span class="n">MB</span><span class="p">)</span>
<span class="n">sim_fd</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">pos</span><span class="p">:</span> <span class="mi">0</span>
<span class="n">EOF</span><span class="p">:</span> <span class="n">Error</span><span class="p">(</span><span class="s2">"eof"</span><span class="p">)</span>
<span class="n">read</span><span class="p">:</span> <span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="n">partial_read</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_in_range_inclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">sim_disk_data</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">fd</span><span class="o">.</span><span class="n">pos</span><span class="p">,</span> <span class="n">partial_read</span><span class="p">)</span>
<span class="n">fd</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="n">partial_read</span>
<span class="k">if</span> <span class="n">fd</span><span class="o">.</span><span class="n">pos</span> <span class="o">==</span> <span class="n">sizeof</span><span class="p">(</span><span class="n">sim_disk_data</span><span class="p">):</span>
<span class="k">return</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span><span class="p">,</span> <span class="n">partial_read</span>
<span class="k">return</span> <span class="n">partial_read</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">sim_io</span> <span class="o">=</span> <span class="p">{</span>
<span class="nb">open</span><span class="p">:</span> <span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="o">=></span> <span class="n">sim_fd</span>
<span class="p">}</span>
<span class="n">out_buf</span> <span class="o">=</span> <span class="n">Vector</span><span class="o"><</span><span class="n">u8</span><span class="o">>.</span><span class="n">new</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">read_file</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="s2">"somefile"</span><span class="p">,</span> <span class="n">out_buf</span><span class="p">)</span>
<span class="n">assert_bytes_equal</span><span class="p">(</span><span class="n">out_buf</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="n">sim_disk_data</span><span class="p">)</span>
<span class="n">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Found logical error with seed: </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span>
<span class="n">throw</span> <span class="n">e</span>
</pre></div>
<p>And with this simulator we would have eventually caught our partial
read bug! In our original program when we wrote:</p>
<div class="highlight"><pre><span></span> <span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">sizeof</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">)])</span>
</pre></div>
<p>We should have written:</p>
<div class="highlight"><pre><span></span> <span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">n_read</span><span class="p">])</span>
</pre></div>
<p>Great! Let's get a little more complex.</p>
<h3 id="a-distributed-system">A distributed system</h3><p>I already mentioned in the beginning that the gist of deterministic
simulation testing a distributed system is that you get all of the
nodes in the system to run in the same process. This would be
basically impossible if you wanted to test a system that involved your
application plus Kafka plus Postgres plus Redis. But if your system is
a self-contained distributed system, such as one that embeds a Raft
library for high availability of your application, you can actually
run multiple nodes into the same process!</p>
<p>For a system like this, our simulator might look like:</p>
<div class="highlight"><pre><span></span><span class="c1"># sim.pseudocode</span>
<span class="kn">import</span> <span class="s2">"distsys-node.pseudocode"</span>
<span class="n">seed</span> <span class="o">=</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span> <span class="err">?</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span><span class="p">)</span> <span class="p">:</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span>
<span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
<span class="k">while</span> <span class="n">true</span><span class="p">:</span>
<span class="n">sim_fd</span> <span class="o">=</span> <span class="p">{</span>
<span class="n">send</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1"># Inject random failure.</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">.5</span><span class="p">:</span>
<span class="n">throw</span> <span class="n">Error</span><span class="p">(</span><span class="s1">'bad write'</span><span class="p">)</span>
<span class="c1"># Inject random latency.</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">.5</span><span class="p">:</span>
<span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">())</span>
<span class="n">n_written</span> <span class="o">=</span> <span class="n">assert_ok</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">fd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span>
<span class="k">return</span> <span class="n">n_written</span>
<span class="p">},</span>
<span class="n">recv</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1"># Inject random failure.</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">.5</span><span class="p">:</span>
<span class="n">throw</span> <span class="n">Error</span><span class="p">(</span><span class="s1">'bad read'</span><span class="p">)</span>
<span class="c1"># Inject random latency.</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">.5</span><span class="p">:</span>
<span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">())</span>
<span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">sim_io</span> <span class="o">=</span> <span class="p">{</span>
<span class="nb">open</span><span class="p">:</span> <span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1"># Inject random failure.</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">.5</span><span class="p">:</span>
<span class="n">throw</span> <span class="n">Error</span><span class="p">(</span><span class="s1">'bad open'</span><span class="p">)</span>
<span class="c1"># Inject random latency.</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">.5</span><span class="p">:</span>
<span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">())</span>
<span class="k">return</span> <span class="n">sim_fd</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">all_ports</span> <span class="o">=</span> <span class="p">[</span><span class="mi">6000</span><span class="p">,</span> <span class="mi">6001</span><span class="p">,</span> <span class="mi">6002</span><span class="p">]</span>
<span class="n">nodes</span> <span class="o">=</span> <span class="p">[</span>
<span class="k">await</span> <span class="n">distsys</span><span class="o">-</span><span class="n">node</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="n">all_ports</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">all_ports</span><span class="p">),</span>
<span class="k">await</span> <span class="n">distsys</span><span class="o">-</span><span class="n">node</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="n">all_ports</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">all_ports</span><span class="p">),</span>
<span class="k">await</span> <span class="n">distsys</span><span class="o">-</span><span class="n">node</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="n">all_ports</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">all_ports</span><span class="p">),</span>
<span class="p">]</span>
<span class="n">history</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">key</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_bytes</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_bytes</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="n">nodes</span><span class="p">[</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand_in_range_inclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">nodes</span><span class="p">)]</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span>
<span class="n">history</span><span class="o">.</span><span class="n">add</span><span class="p">((</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span>
<span class="n">assert_valid_history</span><span class="p">(</span><span class="n">nodes</span><span class="p">,</span> <span class="n">history</span><span class="p">)</span>
<span class="c1"># Crash a process every so often</span>
<span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">></span> <span class="mf">0.75</span><span class="p">:</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">nodes</span><span class="p">[</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand_in_range_inclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)]</span>
<span class="n">node</span><span class="o">.</span><span class="n">restart</span><span class="p">()</span>
<span class="n">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Found logical error with seed: </span><span class="si">%d</span><span class="s2">"</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span>
<span class="n">throw</span> <span class="n">e</span>
</pre></div>
<p>I'm completely hand waving here to demonstrate the broader point and
not any specific testing strategy for a specific distributed
system. The important points are that these three nodes run in the
same process, on different ports.</p>
<p>We control disk IO. We control network IO. We control how time
elapses. We run a deterministic simulated workload against the three
node system while injecting disk, network, and process faults.</p>
<p>And we are constantly checking for an invalid state. When we get the
invalid state, we can be sure the user can easily recreate this
invalid state.</p>
<h3 id="other-sources-of-non-determinism">Other sources of non-determinism</h3><p>Within some error margin, most CPU instructions and most CPU behavior are
considered to be deterministic. There are, however, certain CPU
instructions that are <a href="https://cs.stackexchange.com/questions/132842/under-which-conditions-a-given-program-is-deterministic-on-x86-64-machines/132856#132856">definitely
not</a>. Unfortunately
that might
<a href="https://github.com/facebookexperimental/hermit/issues/34">include</a>
system calls. It might also
<a href="https://stackoverflow.com/a/8171032">include</a> malloc. There is very
little to trust.</p>
<p>If we <a href="https://antithesis.com/blog/deterministic_hypervisor/">ignore</a>
Antithesis, people doing DST seem not to worry about these smaller
bits of nondeterminism. Yet it's generally agreed that DST is still
worthwhile anyway. The intuition here is that every bit of
non-determinism you can eliminate makes it that much easier to
reproduce bugs when you find them.</p>
<p>Put another way: determinism, even among DST practitioners, remains a spectrum.</p>
<h3 id="considerations">Considerations</h3><p>As you may have noticed already from some of the pseudocode, DST is not a panacea.</p>
<h4 id="consideration-1:-edges">Consideration 1: Edges</h4><p>First, because you must swap out non-deterministic parts of your code,
you are not actually testing the entirety of your code. You are
certainly encouraged to keep the deterministic kernel large. But there
will always be the non-deterministic edges.</p>
<p>Without a system like Antithesis which gives you an entire
deterministic machine, you can't test your whole program.</p>
<p>But even with Antithesis you cannot test the <i>integration</i> between your
system and external systems. You must mock out the external systems.</p>
<p>It's also worth noting that there are many areas where you could
inject simulation. You could do it at a high-level RPC and storage
layer. This would be simpler and easier to understand. But then you'd
be omitting testing and error-handling of lower-level errors.</p>
<h4 id="consideration-2:-your-workload(s)">Consideration 2: Your workload(s)</h4><p>DST is dependent on your creativity and thoroughness of your workload
as much as any other type of test or benchmark.</p>
<p>Just as you wouldn't depend on one single benchmark to qualify your
application, you may not want to depend on a single simulated
workload.</p>
<p>Or as Will Wilson put it for me:</p>
<blockquote><p>The biggest challenge of DST in my experience is that tuning all the
random distributions, the parameters of your system, the workload,
the fault injection, etc. so that it produces interesting behavior
is very challenging and very labor intensive. As with fuzzing or
PBT, it's terrifyingly easy to build a DST system that appears to be
doing a ton of testing, but actually never explores very much of the
state space of your system. At FoundationDB, the vast majority of
the work we put into the simulator was an iterative process of
hunting for what wasn't being covered by our tests and then figuring
out how to make the tests better. This process often resembles
science more than it does engineering.</p>
<p>Unfortunately, unlike with fuzzing, mere branch coverage in your
code is usually a pretty poor signal for the kinds of systems you
want to test with DST. At Antithesis we handle this with <a
href="https://antithesis.com/docs/best_practices/sometimes_assertions.html">Sometimes
assertions</a>, at FDB we did something pretty similar, and I assume
TigerBeetle and others have their own version of this. But of course
the ultimate figure of merit is whether your DST system is finding
100% of your bugs. It's quite difficult to get to the point that it
does. The truly ambitious part of Antithesis isn't the hypervisor,
but the fact that we also aim to solve the much harder "is my DST
working?" problem with minimal human guidance or supervision.</p>
</blockquote>
<h4 id="consideration-3:-your-knowledge-of-what-you-mocked">Consideration 3: Your knowledge of what you mocked</h4><p>When you mock out the behavior of disk or network IO, the benefits of
DST are tied to your understanding of the spectrum of behavior that
may happen in the real world.</p>
<p>What are all possible error conditions? What are the extreme latency
bounds of the original method? What about corruption or misdirected
IO?</p>
<p>The flipside here is that only in deterministic simulation testing can
you configure these crazy scenarios to happen at a <i>configurable
regularity</i>. You can kick off a set of runs that have especially
high IO latency or especially high corrupt reads/writes. Joran and I
<a href="https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed-database-in-the-browser">wrote</a>
a year ago about how the TigerBeetle simulator does exactly this.</p>
<h4 id="consideration-4:-non-reproducible-seeds-as-code-changes">Consideration 4: Non-reproducible seeds as code changes</h4><p>Critically, the reproducibility of DST only helps so long as your <i>code
doesn't change</i>. As soon as your code changes, the seed may no longer
even get you to the state where the bug was exhibited. So the
reproducibility of DST means more that it may help you convert the
seed simulation run into an integration test that describes the
precise scenario even as the code changes.</p>
<h4 id="consideration-5:-time-and-compute">Consideration 5: Time and compute</h4><p>Because of Consideration 4, you need to keep rerunning the simulator
not just to keep finding new seeds and new histories but because the
new seeds and new histories may change every time you make changes to
code.</p>
<h3 id="what-about-jepsen?">What about Jepsen?</h3><p>Jepsen does limited process and network fault injection while testing
for linearizability. It's a fantastic project.</p>
<p>However, it represents only a subset of what is possible with
Deterministic Simulation Testing (if you actually put in the effort
described above to get there).</p>
<p>But even more importantly, Jepsen has nothing to do with deterministic
execution. If Jepsen finds a bug and your system can't do
deterministic execution, you may or may not be able to reproduce that
Jepsen bug.</p>
<p>Here's another Will Wilson
<a href="https://antithesis.com/blog/is_something_bugging_you/">quote</a> for you
on Jepsen and FoundationDB:</p>
<blockquote><p>Anyway, we did [Deterministic Simulation Testing] for a while and
found all of the bugs in the database. I know, I know, that’s an
insane thing to say. It’s kind of true though. In the entire history
of the company, I think we only ever had one or two bugs reported by
a customer. Ever. Kyle Kingsbury aka “aphyr” didn’t even bother
testing it with Jepsen, because he didn’t think he’d find anything.</p>
</blockquote>
<h3 id="conclusion">Conclusion</h3><p>The degree to which you can place faith in DST alone, and not time
spent in production, has limits. However, it certainly does no harm to
employ DST. And, barring the considerations described above, will
likely make the kernel of your product significantly more
stable. Furthermore, everyone who uses DST knows about these
considerations. But I think it's worthwhile to list them out to help
folks who do not know DST to build an intuition for what it's
excellent at.</p>
<p>Further reading:</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=4fFDFbi3toc">"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson</a></li>
<li><a href="https://www.polarsignals.com/blog/posts/2024/05/28/mostly-dst-in-go">(Mostly) Deterministic Simulation Testing in Go</a></li>
<li><a href="https://github.com/madsim-rs/madsim">Magical Deterministic Simulator for distributed systems in Rust</a></li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post talking through the basics, considerations, and limitations of Deterministic Simulation Testing.<a href="https://t.co/9Fp5ytL7Wz">https://t.co/9Fp5ytL7Wz</a> <a href="https://t.co/xRE6FOwc0P">pic.twitter.com/xRE6FOwc0P</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1825851204632445377?ref_src=twsrc%5Etfw">August 20, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-08-20-deterministic-simulation-testing.htmlTue, 20 Aug 2024 00:00:00 +0000
- Delightful, production-grade replication for Postgreshttp://notes.eatonphil.com/2024-07-30-delightful-production-grade-replication-postgres.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/delightful-production-grade-replication-postgres'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/delightful-production-grade-replication-postgres">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2024-07-30-delightful-production-grade-replication-postgres.htmlTue, 30 Jul 2024 00:00:00 +0000
- A reawakening of systems programming meetupshttp://notes.eatonphil.com/2024-07-07-systems-meetups.html<p>This year has seen a resurgence in really high quality systems
programming meetups. <a href="https://www.meetup.com/munich-database-meetup/">Munich Database
Meetup</a>, <a href="https://lu.ma/8ujc7st3?tk=DAAbmn">Berlin
Systems Group</a>, <a href="https://lu.ma/t6r4mi4v">SF Distributed
Systems Meetup</a>, <a href="https://nycsystems.xyz/">NYC
Systems</a>, <a href="https://twitter.com/BengaluruSys">Bengaluru
Systems</a>, to name a few.</p>
<p>This post summarizes a bit of disappointing recent tech meetup
history, the new trend of excellent systems programming meetups, and
ends with some encouragement and guidance for running your own systems
programming events.</p>
<p>I will be a little critical in this post but I want to preface by
saying: organizing meetups is really tough! It takes a lot of work and
I have a huge amount of respect for meetup organizers even when their
meetup style did not resonate with me.</p>
<p>Although much of this post talks about NYC Systems, the reason I think
this post is worth writing is because so many other meetups in a
similar vein popped up. I hope to encourage these other meetups and to
encourage folks in other major metros (London, for example) to start
similar meetups.</p>
<h3 id="meetups">Meetups</h3><p>I used to attend a bunch of meetups before the pandemic. But I quickly
got disillusioned. Almost every meetup was varying degrees of startups
pitching their product. The last straw for me was sitting through a talk
at a JavaScript meetup that was by a devrel employee of a startup who
literally gave a tutorial for their product.</p>
<p>There were also some pretty intelligent meetups like the New York
Haskell Users Group and the New York Emacs Meetup. But not being an
expert in either domain, and the attendees almost solely appearing to
be experts, I didn't particularly enjoy going.</p>
<p>There were a couple of meetups that felt inclusive for various
skill-levels of attendees yet still went into interesting
depth. Specifically, <a href="http://www.nylug.org/">New York Linux User
Group</a> and <a href="https://paperswelove.org/chapter/newyork/">Papers We Love
NYC</a>.</p>
<p>These meetups were exceptional because they were language- and
framework-agnostic, they would start broad to give you background, but
then go deep into a topic. Maybe you only understood 50% of what was
covered. But you get exposed to something new from an expert in that
domain.</p>
<p>Unfortunately, the pandemic happened and these two excellent meetups
basically have not come back.</p>
<h3 id="a-couple-of-students-in-munich">A couple of students in Munich</h3><p>The pandemic ended and I tried a couple of meetups I thought might be
better quality. Rust and Go. But they weren't much better than I
remembered. People would give a high level talk and brush over all the
interesting concepts.</p>
<p>I had been thinking of doing an in-person talk series since 2022.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">If I put together a systems/databases/distributed systems meetup in NYC (a physical meetup, not Zoom), who'd be interested (in attending, or presenting, or helping me organize, or donating space)?<br><br>No promises!</p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1574875016067710976?ref_src=twsrc%5Etfw">September 27, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>But I was busy with TigerBeetle until December of 2023 when I was
messaged on LinkedIn by <a href="https://x.com/georg_kreuzmayr?lang=en">Georg
Kreuzmayr</a>, a graduate student
at Technical University of Munich (TUM).</p>
<p>Georg and his friends, fellow graduate students at TUM, started a
database club: <a href="https://www.tumuchdata.club/">TUMuchData</a>. We got to
talking about opportunities for collaboration and I started feeling a
bit embarrassed that a graduate student had more guts than I had to
get
<a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">back</a>
onto the meetup organizer wagon.</p>
<p>A week later, with assurance from <a href="https://twitter.com/justinjaffray">Justin
Jaffray</a> that at least he would
show up with me if no one else did, I started the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">NYC Systems Coffee
Club</a> to bring
together folks in NYC interested in any topic of systems programming
(e.g. compilers, databases, web browser internals, distributed
systems, formal methods, etc.). To bring them together in a completely
informal setting for coffee at 9am in the morning in a public space in
midtown Manhattan.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Trying something new! If you're a dev in NYC working <br>on (or interested in) systems programming, grab a coffee and come hang out at 1 Bryant Park (indoor space) this Thursday 9AM - 9:30AM.<br><br>See post for details and fill out the Google Form or DM me!<a href="https://t.co/A4bzcPGy6x">https://t.co/A4bzcPGy6x</a> <a href="https://t.co/n1ECMd59ev">pic.twitter.com/n1ECMd59ev</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1734216183459512486?ref_src=twsrc%5Etfw">December 11, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>I set up that linked web page and started collecting subscribers to
the club via Google Form. Once a month I'd send an email out to the
list asking for RSVPs to this month's coffee club. The first 20 to
respond would get a calendar invite.</p>
<p><img src="/assets/coffee-club-invite.png" alt="/assets/coffee-club-invite.png"></p>
<p>And about the same time I started asking around on Twitter/LinkedIn if
someone would be interested in co-organizing a new systems programming
meetup in NYC. <a href="https://twitter.com/ngeloxyz">Angelo Saraceno</a>
immediately took me up on the idea and we met up.</p>
<h3 id="nyc-systems">NYC Systems</h3><p>We agreed on the premise: this would be a language- and
framework-agnostic meetup that was focused on engineering challenges,
not product pitches. It would be 100% for the sake of corporate
marketing, but corporate marketing of the <em>engineering team</em>, not the
product.</p>
<p><a href="https://nycsystems.xyz/">NYC Systems</a> was born!</p>
<p>We'd find speakers who could start broad and dive deep into some
interesting aspect of databases, programming languages, distributed
systems, and so on. Product pitches were necessary to establish a
context, but the focus of the talk would be about some interesting
recent technical challenge and how they dealt with it.</p>
<p>We'd schedule talks only every other month to ease our own burden in
organizing and finding great speakers.</p>
<p>Once Angelo and I had decided to go forward, the next two challenges
were finding speakers and finding a venue. Thanks to Twitter and
LinkedIn, finding speakers turned out to be the easy part.</p>
<p>It was harder to find a venue. It was surprisingly challenging to find
a company in NYC with a shared vision that the important thing about
being associated with a meetup like this is to be associated with the
quality of speakers and audience we can bring in by not allowing
transparent product pitches.</p>
<p>Almost every company in Manhattan with space we spoke with had a
requirement that they have their own speaker each night. That seemed
like a bad idea.</p>
<p>I think it was especially challenging to find a company willing to
relax about branding requirements like this because we were a new
meetup.</p>
<p>It was pretty frustrating not to find a sympathetic company with space
in Manhattan. And the only reason we didn't give up was because Angelo
was so adament that this kind of meetup actually happen. It's always
best to start something new with someone else for this exact
reason. You can keep each other going.</p>
<p>In the end we went with the company that did not insist on their
own speaker or their own branding. A Brooklyn-based company whose CEO
immediately got in touch with me that they wanted to host us, <a href="https://trailofbits.com/">Trail
of Bits</a>.</p>
<h3 id="how-it-works">How it works</h3><p>To keep things easy, I set up a web page on my personal site with
information about the meetup. (Eventually we moved this to
<a href="https://nycsystems.xyz/">nycsystems.xyz</a>.) I set up a Google Form to
collect emails for a mailing list. And we started posting about the
group on Twitter and LinkedIn.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Very pleased to share the first NYC Systems Talks are taking place next Thursday Feb 22nd 6PM. Hosted by <a href="https://twitter.com/trailofbits?ref_src=twsrc%5Etfw">@trailofbits</a>, with <a href="https://twitter.com/paulgb?ref_src=twsrc%5Etfw">@paulgb</a> and <a href="https://twitter.com/StefanKarpinski?ref_src=twsrc%5Etfw">@StefanKarpinski</a> speaking.<br><br>Space is not infinite, fill out the Google Form if you can attend and would like an invite!<a href="https://t.co/jNssr5v1kJ">https://t.co/jNssr5v1kJ</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1758249063550447768?ref_src=twsrc%5Etfw">February 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>We published the event calendar in advance (an HTML table on the
website) and announced each event's speakers a week in advance of the
event. I'd send another Google Form to the mailing list taking RSVPs
for the night. The first 60 people to respond got a Google Calendar
invite.</p>
<p><img src="/assets/nyc-systems.png" alt="/assets/nyc-systems.png"></p>
<p>It's a bit of work, sure, but I'd do anything to avoid Meetup.com.</p>
<p class="note">
It is interesting to see every new systems programming meetup also
not pick Meetup.com. The only one that went with it, Munich Database
Meetup, is a revival of an existing group, the Munich NoSQL Meetup
and presumably they didn't want to give up their subscribers. Though
most others use lu.ma.
</p><p>The mailing list is now about 400+ people. And in each event RSVP we
have a wait list of 20-30 people. Of course although 60 people say Yes
initially, by the time of the event we have typically gotten about 50
people in attendance.</p>
<p>At each event, Trail of Bits provided screens, chairs, food, and
drink. Angelo had recording equipment so he took over audio/video
capturing (and later editing and publishing).</p>
<p>After each event we'd publish talk videos to our
<a href="https://www.youtube.com/@NYCSystems">@NYCSystems</a> Youtube.</p>
<h3 id="network-effects">Network effects</h3><p>In March 2024, the TUMuchData folks joined <a href="https://x.com/ifesdjeen">Alex
Petrov</a>'s Munich NoSQL Meetup to form the
Munich Database Meetup. In May, <a href="https://twitter.com/thegeeknarrator">Kaivalya
Apte</a> and <a href="https://twitter.com/mgill25">Manish
Gill</a> started the Berlin Systems Group,
inspired by Alex and the Munich Database Meetup.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I want to start a Berlin Database/Storage systems group, where we have regular meetups, discussions and talks. <br><br>WDYT? <a href="https://twitter.com/mgill25?ref_src=twsrc%5Etfw">@mgill25</a> <a href="https://twitter.com/mehd_io?ref_src=twsrc%5Etfw">@mehd_io</a> <a href="https://twitter.com/ClickHouseDB?ref_src=twsrc%5Etfw">@ClickHouseDB</a> <a href="https://twitter.com/SnowflakeDB?ref_src=twsrc%5Etfw">@SnowflakeDB</a> <a href="https://twitter.com/awscloud?ref_src=twsrc%5Etfw">@awscloud</a> <a href="https://twitter.com/GoogleDE?ref_src=twsrc%5Etfw">@GoogleDE</a> <a href="https://twitter.com/TUBerlin?ref_src=twsrc%5Etfw">@TUBerlin</a> <br><br>Can I get some support? Who else would be interested? <a href="https://twitter.com/hashtag/Databases?src=hash&ref_src=twsrc%5Etfw">#Databases</a> <br><br>Thanks…</p>— Kaivalya Apte - The Geek Narrator (@thegeeknarrator) <a href="https://twitter.com/thegeeknarrator/status/1790782561515372676?ref_src=twsrc%5Etfw">May 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>In May 2024, two PhD students in the San Francisco Bay Area, <a href="https://x.com/ShadajL">Shadaj
Laddad</a> and <a href="https://x.com/conor_power23">Conor
Power</a>, started the SF Distributed
Systems meetup.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">We’re super excited to be organizing a new SF Distributed Systems meetup NEXT WEEK! Our first meetup features <a href="https://twitter.com/julianhyde?ref_src=twsrc%5Etfw">@julianhyde</a> and <a href="https://twitter.com/conor_power23?ref_src=twsrc%5Etfw">@conor_power23</a> presenting work on extending SQL and applying algebraic properties, sign up at <a href="https://t.co/d2lLDaQ5iJ">https://t.co/d2lLDaQ5iJ</a></p>— Shadaj Laddad (@ShadajL) <a href="https://twitter.com/ShadajL/status/1790767187327889456?ref_src=twsrc%5Etfw">May 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>And in July 2024, <a href="https://twitter.com/shraddhaag">Shraddha Agrawal</a>,
<a href="https://twitter.com/anirudhRowjee">Anirudh Rowjee</a> and friends kicked
off the first Bengaluru Systems Meetup.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Are you ready, Systems Enthusiasts of Bengaluru?<br><br>Speaking at our first-ever meetup on 6th July, we have:<a href="https://twitter.com/simsimsandy?ref_src=twsrc%5Etfw">@simsimsandy</a> with "Learn about the systems that power GenAI applications" and <a href="https://twitter.com/vivekgalatage?ref_src=twsrc%5Etfw">@vivekgalatage</a> with "The Browser Backstage: Performance vs Security" <br>(talks linked below!)</p>— Bengaluru Systems Meetup (@BengaluruSys) <a href="https://twitter.com/BengaluruSys/status/1808949578307183060?ref_src=twsrc%5Etfw">July 4, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<h3 id="suggestions">Suggestions</h3><p>First off, don't pay for anything yourself. Find a company who will
host. At the same time, don't feel the need to give in too much to the
demands of the company. I'd be happy to help you think through how to
talk about the event with companies. It is mutually beneficial for
them to get to give a 5-minute hiring/product pitch and not need to do
extensive branding nor to give a 30-minute product tutorial.</p>
<p>Second, keep a bit of pressure on speakers to not do an overview talk
and not to do a product pitch. Suggest that they tell the story of
some interesting recent bug or interesting recent feature. What
happened? Why was it hard? What did you learn?</p>
<p>Focusing on these types of talks will help you get a really
interesting audience.</p>
<p>I have been continuously surprised and impressed at the folks who show
up for NYC Systems. It's a mix of technical founders in the systems
space, pretty experienced developers in the systems space, graduate
students, and developers of all sorts.</p>
<p>I am certain we can only get these kinds of folks to show up because
we avoid product pitch-type talks.</p>
<p>Third, finding speakers is still hard! The best approach so far has
been to individually message folks in industry and academia who hang
out on Twitter. Sending out a public call is easy but doesn't often
pan out. So keep an eye on interesting companies in the area.</p>
<p>Another avenue I've been thinking about is messaging VC connections to
ask them if they know any engineers/technical founders/CTOs in the
area who could give an interesting technical talk.</p>
<p>Fourth, speak with other organizers! I finally met Alex Petrov in
person last month and we had a <a href="https://twitter.com/ifesdjeen/status/1806677549038063901">great
time</a>
talking about the challenges and joys of organizing really high
quality meetups.</p>
<p>I'm always happy to chat, DMs are open.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New post telling a bit of the history behind <a href="https://t.co/NEh1tm8v3Q">https://t.co/NEh1tm8v3Q</a>; why it only exists due to folks like <a href="https://twitter.com/georg_kreuzmayr?ref_src=twsrc%5Etfw">@georg_kreuzmayr</a> and <a href="https://twitter.com/ngeloxyz?ref_src=twsrc%5Etfw">@ngeloxyz</a>; the explosion of systems meetups around the world; and encouragement and suggestions for future organizers!<a href="https://t.co/dwe4TtmXKK">https://t.co/dwe4TtmXKK</a> <a href="https://t.co/ZMLkVYdZDJ">pic.twitter.com/ZMLkVYdZDJ</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1809934997442498812?ref_src=twsrc%5Etfw">July 7, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-07-07-systems-meetups.htmlSun, 07 Jul 2024 00:00:00 +0000
- A write-ahead log is not a universal part of durabilityhttp://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html<p>A database does not need a write-ahead log (WAL) to achieve
durability. A database can write its long-term data structure durably
to disk before returning to a client. Granted, this is a bad idea! And
granted, a WAL <b>is</b> critical for durability <b>by design</b> in most
databases. But I think it's helpful to understand WALs by
understanding what you <b>could</b> do without them.</p>
<p>So let's look at what terrible design we can make for a durable
database that has no write-ahead log. To motivate the idea of, and
build an intuition for, a write-ahead log.</p>
<p>Thank you to Alex Miller for reviewing a version of this post.</p>
<p>But first, what is durability?</p>
<h3 id="durability">Durability</h3><p>Durability happens in the context of a request a client makes to a
data system (either an embedded system like SQLite or RocksDB or a
standalone system like Postgres). Durability is a spectrum of
guarantees the server provides when a client requests to write some
data: that either the request succeeds and the data is safely written
to disk, or the request fails and the client must retry or decide to
do something else.</p>
<p>It can be difficult to set an absolute definition for durability since
different databases have different concepts of what can go wrong with
disks (also called a "storage fault model"), or they have no concept
at all.</p>
<p>Let's start from the beginning.</p>
<h4 id="an-in-memory-database">An in-memory database</h4><p>An in-memory database has no durability at all. Here is pseudo-code
for an in-memory database service.</p>
<div class="highlight"><pre><span></span><span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">"value"</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span>
</pre></div>
<p>Throughout this post, for the sake of code brevity, imagine that the
environment is concurrent and that data races around shared mutable
values like <code>db</code> are protected somehow.</p>
<h4 id="writing-to-disk">Writing to disk</h4><p>If we want to achieve the most basic level of durability, we can write
this database to a file.</p>
<div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"kv.db"</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">"value"</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span>
</pre></div>
<p><code>btree.write_to_disk</code> will call
<a href="https://linux.die.net/man/2/pwrite">pwrite(2)</a> under the hood. And
we'll assume it does copy-on-write for only changed pages. So imagine
we have a large database represented by a btree that takes up 10GiB on
disk. With the btree algorithm, if we write a single entry to the
btree, often only a single (often 4Kib) page will get written rather
than all pages (holding all values) in the tree. At the same time, in
the worst case, the entire tree (all 10GiB of data) may need to get
rewritten.</p>
<p>But this code isn't crash-safe. If the virtual or physical machine
this code is running on reboots, the data we wrote to the file may not
actually be on disk.</p>
<h4 id="fsync">fsync</h4><p>File data is buffered by the operating system by default. By general
consensus, writing data without flushing the operating system buffer
is not considered durable. Every so often a new database will show up
on Hacker News claiming to beat all other databases on insert speed
until a commenter points out the new database doesn't actually flush
data to disk.</p>
<p>In other words, the commonly accepted requirement for durability is
that not only do you write data to a file on disk but you
<a href="https://man7.org/linux/man-pages/man2/fsync.2.html">fsync(2)</a> the
file you wrote. This forces the operating system to flush to disk any
data it has buffered.</p>
<div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"kv.db"</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Force a flush</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">"value"</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span>
</pre></div>
<p>Furthermore you must not ignore fsync failure. How you deal with fsync
failure is up to you, but exiting immediately with a message that the
user should restore from a backup is sometimes considered acceptable.</p>
<p>Databases don't like to fsync because it's slow. Many major databases
offer modes where they do not fsync data files before returning a
success to a client. Postgres
<a href="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-FSYNC">offers</a>
this unsafe mode, though does not default to it and warns against
it. MongoDB offers this unsafe mode but <a href="https://www.mongodb.com/docs/manual/core/journaling/#journaling-process">does not
default</a>
to it.</p>
<p class="note">
An earlier version of this post said that MongoDB would unsafely
flush on an interval. Daniel Gomez Ferro from MongoDB messaged me
that while the docs are confusing, the default write concern
"majority" does actually imply "j: true" which means data is
synchronized (i.e. fsync-ed) before returning a success to a client.
</p><p>Almost every database trades safety for performance in some
regard. For example, few databases but SQLite and Cockroach default to
Serializable Isolation. While it is commonly agreed that basically no
level below Serializable Isolation (that all other databases default
to) can be reasoned about. Other databases offer Serializable
Isolation, they just don't default to it. Because it can be slow.</p>
<h4 id="group-commit">Group commit</h4><p>But let's get back to fsync. One way to amortize the cost of fsync is
to delay requests so that you write data from each of them and then
fsync the data from all requests. This is sometimes called group
commit.</p>
<p>For example, we could update the database in-memory but have a
background thread serialize to disk and call fsync only every 5ms.</p>
<div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"kv.db"</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">group_commit_sems</span> <span class="o">=</span> <span class="p">[]</span>
<span class="nd">@background_worker</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">group_commit</span><span class="p">():</span>
<span class="k">for</span><span class="p">:</span>
<span class="k">if</span> <span class="n">clock</span><span class="p">()</span> <span class="o">%</span> <span class="mi">5</span><span class="n">ms</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Durably flush for the group</span>
<span class="k">for</span> <span class="n">sem</span> <span class="ow">in</span> <span class="n">group_commit_sems</span><span class="p">:</span>
<span class="n">sem</span><span class="o">.</span><span class="n">signal</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">sem</span> <span class="o">=</span> <span class="n">semaphore</span><span class="p">()</span>
<span class="n">group_commit_sems</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">sem</span><span class="p">)</span>
<span class="n">sem</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">"value"</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span>
</pre></div>
<p>It is critical that <code>handle_write</code> waits to return a success until the
write is durable via fsync.</p>
<p>So to reiterate, the key idea for durability of a client request is
that you have some version of the client message stored on disk
durably with fsync before returning a success to a client.</p>
<p>From now on in this post, when you see "durable" or "durability", it
means that the data has been written and fsync-ed to disk.</p>
<h3 id="optimizing-durable-writes">Optimizing durable writes</h3><p>A key insight is that it's silly to serialize the entire permanent
structure of the database to disk every time a user writes.</p>
<p>We could just write the user's message itself to an append-only
log. And then only periodically write the entire btree to disk. So
long as we have fsync-ed the append-only log file, we can safely
return to the user even if the btree itself has not yet been written
to disk.</p>
<p>The additional logic this requires is that on startup we must read the
btree from disk and then replay the log on top of the btree.</p>
<div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"kv.db"</span><span class="p">,</span> <span class="s2">"rw"</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">log_f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"kv.log"</span><span class="p">,</span> <span class="s2">"rw"</span><span class="p">)</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">log</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">()</span>
<span class="k">for</span> <span class="n">log</span> <span class="ow">in</span> <span class="n">l</span><span class="o">.</span><span class="n">read_logs_from</span><span class="p">(</span><span class="n">db</span><span class="o">.</span><span class="n">last_log_index</span><span class="p">):</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">log</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">log</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">group_commit_sems</span> <span class="o">=</span> <span class="p">[]</span>
<span class="nd">@background_worker</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">group_commit</span><span class="p">():</span>
<span class="k">for</span><span class="p">:</span>
<span class="n">log_accumulator</span> <span class="o">=</span> <span class="n">log_page</span><span class="p">()</span>
<span class="k">if</span> <span class="n">clock</span><span class="p">()</span> <span class="o">%</span> <span class="mi">5</span><span class="n">ms</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">for</span> <span class="p">(</span><span class="n">log</span><span class="p">,</span> <span class="n">_</span><span class="p">)</span> <span class="ow">in</span> <span class="n">group_commit_sems</span><span class="p">:</span>
<span class="n">log_accumulator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">log</span><span class="p">)</span>
<span class="n">log_f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">log_accumulator</span><span class="o">.</span><span class="n">page</span><span class="p">())</span> <span class="c1"># Write out all log entries at once</span>
<span class="n">log_f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Durably flush wal data</span>
<span class="k">for</span> <span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">sem</span><span class="p">)</span> <span class="ow">in</span> <span class="n">group_commit_sems</span><span class="p">:</span>
<span class="n">sem</span><span class="o">.</span><span class="n">signal</span><span class="p">()</span>
<span class="k">if</span> <span class="n">clock</span><span class="p">()</span> <span class="o">%</span> <span class="mi">1</span><span class="n">m</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
<span class="n">f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Durably flush db data</span>
<span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">sem</span> <span class="o">=</span> <span class="n">semaphore</span><span class="p">()</span>
<span class="n">log</span> <span class="o">=</span> <span class="n">req</span>
<span class="n">group_commit_sems</span><span class="o">.</span><span class="n">push</span><span class="p">((</span><span class="n">log</span><span class="p">,</span> <span class="n">sem</span><span class="p">))</span>
<span class="n">sem</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span> <span class="c1"># This time waiting for only the log to be written and flushed, not the btree.</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span>
<span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">"value"</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span>
</pre></div>
<p>This is a write-ahead log!</p>
<p>Consider a few scenarios. One request writes the smallest key ever
seen. And one request within the same millisecond writes the largest
key ever seen. Writing these to disk on the btree means modifying at
least two pages spread out in space on disk.</p>
<p>But if we only have to durably write these two messages to a log, they
can likely both be included in the same log page. ("Likely" so long as
key and values are small enough that multiple can fit into the same
page.)</p>
<p>That is, it's cheaper to write only these small messages representing
the client request to disk. And we save the structured btree
persistence for a less frequent durable write.</p>
<h3 id="filesystem-and-disk-bugs">Filesystem and disk bugs</h3><p>Sometimes filesystems will write data to the wrong place. Sometimes
disks corrupt data. A solution to both of these is to checksum the
data on write, store the checksum on disk, and confirm the checksum on
read. This combined with a background process called scrubbing to
validate unread data can help you learn quickly when your data has
been corrupted and you must recover from backup.</p>
<p>MongoDB's default storage engine WiredTiger <b>does</b> checksum data <a href="https://github.com/wiredtiger/wiredtiger/blob/develop/src/docs/tune-checksum.dox#L3">by
default</a>.</p>
<p>But some databases famous for integrity do not. Postgres does <a href="https://www.postgresql.org/docs/current/checksums.html">no data
checksumming</a>
by default:</p>
<blockquote><p>By default, data pages are not protected by checksums, but this can
optionally be enabled for a cluster. When enabled, each data page
includes a checksum that is updated when the page is written and
verified each time the page is read. Only data pages are protected by
checksums; internal data structures and temporary files are not.</p>
</blockquote>
<p>SQLite likewise does no checksumming by default. Checksumming is an
<a href="https://www.sqlite.org/cksumvfs.html">optional extension</a>:</p>
<blockquote><p>The checksum VFS extension is a VFS shim that adds an 8-byte
checksum to the end of every page in an SQLite database. The checksum
is added as each page is written and verified as each page is
read. The checksum is intended to help detect database corruption
caused by random bit-flips in the mass storage device.</p>
</blockquote>
<p>But even this isn't perfect. Disks and nodes can fail completely. At
that point you can only improve durability by introducing redundancy
across disks (and/or nodes), for example, via distributed consensus.</p>
<h3 id="other-reasons-you-<em>need</em>-a-wal?">Other reasons you <em>need</em> a WAL?</h3><p>Some databases (like SQLite) require a write-ahead log to implement
aspects of ACID transactions. But this need not be a requirement for
ACID transactions if you do MVCC (SQLite does not). See my previous
post on <a href="https://notes.eatonphil.com/2024-05-16-mvcc.html">implementing
MVCC</a> for details.</p>
<p>Logical replication (also called change data capture (CDC)) is another
interesting feature that requires a write-ahead log. The idea is that
the log already preserves the exact order and changes that affect the
database's "state machine". So we could copy these changes out of the
system by tracking the write-ahead log, preserving change order, and
apply these changes to a foreign system.</p>
<p>But again, just CDC is not about durability. It's an ancillary feature
that write-ahead logs make simple.</p>
<h3 id="conclusion">Conclusion</h3><p>A few key points. One, durability primarily matters if it is
established before returning a success to the client. Second, a
write-ahead log is a cheap way to get durability.</p>
<p>And finally, durability is a spectrum. You need to read the docs for
your database to understand what it does and does not.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here's a new post about durability and write-ahead logs. Write-ahead logs are used almost everywhere. But to build an intuition for why, it is helpful to imagine what you would do without a WAL. And to explore the meaning of durability.<a href="https://t.co/nzS2pMz22z">https://t.co/nzS2pMz22z</a> <a href="https://t.co/m1n9x8CNcp">pic.twitter.com/m1n9x8CNcp</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1807741130093556098?ref_src=twsrc%5Etfw">July 1, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.htmlMon, 01 Jul 2024 00:00:00 +0000
- The limitations of LLMs, or why are we doing RAG?http://notes.eatonphil.com/2024-06-17-limitations-llm-or-why-are-we-doing-rag.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/limitations-llm-or-why-are-we-doing-rag'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/limitations-llm-or-why-are-we-doing-rag">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2024-06-17-limitations-llm-or-why-are-we-doing-rag.htmlMon, 17 Jun 2024 00:00:00 +0000
- Confusion is a musehttp://notes.eatonphil.com/2024-06-14-confusion-is-a-muse.html<p>Some of the most interesting technical blog posts I read come from,
and a common reason for posts I write is, confusion. You're at work
and you start asking questions that are difficult to answer. You spend
a few hours or a day trying to get to the bottom of things.</p>
<p>If you ask a question to very experienced and successful developers at
work, they have a tendency not to give context and to simplify things
down to a single answer. This may be a good way to make business
decisions. (One can't afford to waste an eternity considering
everything indefinitely.) But accepting an answer you don't understand
is actively harmful for building intuition.</p>
<p>Certainly, sometimes not accepting an answer can be irritating. You'll
have to figure that out.</p>
<p>But beyond "go along to get along", another reason we don't pursue
what we're confused about is because we're embarrassed that we're
confused in the first place. What's worse, the embarrassment we feel
naturally grows the more experienced we get. "I've got this job title,
I don't want to seem like I don't know what you mean."</p>
<p>But if you fight the embarrassment and pursue your confusion
regardless, you'll likely figure something very interesting
out. Moreover, you will probably not have been the only person who was
confused. At least personally it is quite rare that I am confused
about something and no one else is.</p>
<p>So pay attention when you get confused, and consider why it
happened. What did you expect to be the case, and how did reality
differ? Explore the angles and the options. When you finally
understand, think about what led you to that understanding.</p>
<p>Write it down. Put it into an internal Markdown doc, an internal
Atlassian doc, an internal Google Slides page, whatever. The medium
doesn't matter.</p>
<p>This entire process doesn't come easily. We feel embarrassed. We
aren't used to lingering on something we're confused by. We aren't
used to writing things down.</p>
<p>But if you can make yourself pause every once in a while and think
about what you (or someone around you) got confused by, and if you can
force yourself to stop getting embarrassed by what you got confused
by, and if you can write down the background and the reasoning that
led to your ultimate understanding, you're going to have something
pretty interesting to talk about.</p>
<p>You'll contribute to the growth and intuition of your colleagues. And
you'll never run out of things to write about.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Confusion is embarrassing. But fight that feeling, and dig into why you're confused. And write it down.<br><br>You won't be the only one who was confused. And you'll tend to have something pretty interesting to talk about.<a href="https://t.co/IdX1nGBheR">https://t.co/IdX1nGBheR</a> <a href="https://t.co/KzTjqMxw6u">pic.twitter.com/KzTjqMxw6u</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1801644601536664014?ref_src=twsrc%5Etfw">June 14, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-06-14-confusion-is-a-muse.htmlFri, 14 Jun 2024 00:00:00 +0000
- How I run a software book clubhttp://notes.eatonphil.com/2024-05-30-how-i-run-book-clubs.html<p>I've been running software book clubs almost continuously since last
summer, about 12 months ago. We read through <a href="https://eatonphil.com/2023-ddia.html">Designing Data-Intensive
Applications</a>, <a href="https://eatonphil.com/2023-database-internals.html">Database
Internals</a>,
<a href="https://eatonphil.com/2024-systems-performance.html">Systems
Performance</a>, and
we just started <a href="https://eatonphil.com/2024-understanding-software-dynamics.html">Understanding Software
Dynamics</a>.</p>
<p>The DDIA discussions were in-person in NYC with about 5-8 consistent
attendees. The rest have been over email with 300, 500, and 600
attendees.</p>
<p>This post is for folks who are interested in running their own book
club. None of these ideas are novel. I co-opted the best parts I saw
from other people running similar things. And hopefully you'll improve
on my experience too, should you try.</p>
<p>Despite the length of this post running a book club takes almost no
noticeable effort, other than when I need to select and confirm
discussion leaders. It is the limited-effort-required to thank that
I've kept up the book clubs so consistently.</p>
<h3 id="google-groups">Google Groups</h3><p>I run the virtual book clubs over email. I create a Google Group and
tell people to send me their email for an invite. I use a Google Form
to collect emails since I get many. If you're doing a small group
book club you can just collect member emails directly.</p>
<p>In the Google Form I ask people to volunteer to lead discussion for a
chapter (or chapters). And I ask for a Twitter/GitHub/LinkedIn
account.</p>
<p>When I've gotten enough responses I go through the list and check
Twitter/GitHub/LinkedIn info to find people who might have a
particularly interesting perspective to lead a discussion.</p>
<p>"Lead a discussion" sounds formal but I mean anything but. All I am
looking for is someone to start a new Google Group thread each week
and for them to share their thoughts.</p>
<p>For example a discussion leader might share:</p>
<ul>
<li>What they liked about the chapter</li>
<li>Something new they learned from the chapter</li>
<li>A story about their work that the chapter reminded them of</li>
<li>A little project they hacked on, inspired by reading the chapter</li>
<li>A paper or YouTube video this chapter reminded them of</li>
<li>Something from the chapter that was confusing</li>
<li>Etc.</li>
</ul>
<p>The "discussion leader" has no responsibility for remaining in the
discussion after posting the thread. There just isn't an easy way to
say "person who kicks off discussion" than to call them a "discussion
leader".</p>
<p>By the way, I didn't do discussion leaders for the first book club,
reading DDIA. And that book club took noticeably more effort. Because
I organized it, I was effectively the discussion leader every
time. Having discussion leaders disperses the effort of the book
club. And I think it makes the club much more interesting.</p>
<h4 id="sparknotes-ification">SparkNotes-ification</h4><p>One thing I noticed happening often was that the discussion leader
might do a large summary of the chapter. I greatly appreciate and
respect that effort, but I think this is not the ideal thing to
happen. Of course you can't control what people do and maybe they
really wanted to write a summary. But since noticing this happen I now
try to discourage the discussion leader from summarizing since 1) it
must be quite time-consuming and 2) it isn't as interesting as some of
the above bullet points.</p>
<h4 id="confirming-with-leaders">Confirming with leaders</h4><p>When I have picked out folks who seem like they'd be fun discussion
leaders, I bcc email them all asking them to confirm. At the same time
I explain what being a discussion leader means. As I just explained it
here above.</p>
<p>Each week's discussion gets a new Google Group thread. Discussion
happens in responses to the thread.</p>
<p>I ask the discussion leaders to create the new discussion thread
between Friday and Saturday their local time.</p>
<p>For folks who don't confirm, I email them one last time and then if
they still haven't confirmed I find someone new.</p>
<p>I always lead the first week's discussion so that the discussion
leaders can see what I do and so that I can establish the pattern.</p>
<h4 id="managing-leaders">Managing leaders</h4><p>It takes a while to read a book. Sometimes the leaders forget to do
their part. If it gets to be Sunday and the discussion leader for the
week hasn't started discussion, I email them to gently ask if they are
still available to kick off discussion. And if they are not, no
worries, I can step in.</p>
<p>I have had to step in a few times to start discussion and it's no
problem.</p>
<h4 id="managing-non-leaders">Managing non-leaders</h4><p>Just as you need to clarify and set expectations for discussion
leaders, you need to clarify and set expectations for everyone else.</p>
<p>When I invite people to the Google Group I typically also create an
Intro thread where I explain the discussion format.</p>
<p>An annoying aspect of Google Groups is that I cannot limit who can
<em>create</em> a thread without limiting who can <em>respond</em> to a thread.</p>
<p>It would simplify things for me if I could limit thread creation to
discussion leaders. But since I cannot, I try to repeatedly and
explicitly mention in the Intro thread that no one should start a new
discussion thread unless they are a discussion leader. And that new
threads will come out each weekend to discuss the previous chapter.</p>
<h4 id="setting-the-tone">Setting the tone</h4><p>One of the most important things to do in the Intro email is to set
the tone. I try to clarify this is a friendly and encouraging group
focused on learning and improving ourselves. We have experts in the
group and we have noobs in the group and they are all welcome and will
all come away with different things.</p>
<h3 id="why-email?">Why email?</h3><p>Email seems to be the most time-friendly and demographic-friendly
medium. Doing live discussion sounds stressful and difficult to
schedule, although I believe Alex Petrov <a href="https://x.com/ifesdjeen/status/1795813863197409384">runs live
discussions</a>. Email
forces you to slow down and think things through. And email is
built around an inbox. If you didn't get to read some discussion,
you can mark it unread. You can't do that in Discord or Slack.</p>
<h3 id="avoiding-long-term-commitments">Avoiding long-term commitments</h3><p>When I pick a book, aside from picking books I think are likely to be
exceptionally well-written, I try to avoid books that we could not
finish within 3 months. It concerns me to try to get people to commit
to something longer than that.</p>
<p>This has led to some distortion though. Systems Performance has only
16 chapters. One chapter a week means about 3 months in total. But
each chapter is 100 pages long.</p>
<p>I was hesitant to do a reading of Understanding Software Dynamics
because it has 28 chapters. But each chapter is only 10-15 pages
long. So when I decided to go with it, I decided we'd read 2 chapters a
week. Each discussion leader is responsible for 2 chapters at a
time. That means we can finish within 3 months. And each week we read
only 20-30 pages, which is still much more doable than 100 pages of
Systems Performance.</p>
<p>On the other hand, we did make it through Systems Performance! Which
gives me confidence to pick other books that are physically daunting,
should they otherwise seem like a good idea.</p>
<h4 id="a-book-ends">A book ends</h4><p>Many public book clubs go through a book a month and have no
ending. That is totally fair. But what I love about the way I organize
book clubs is that each reading is unrelated to the next. It's an
entirely new signup for each book. You need only "commit" (I mean, you
can drop off whenever and definitely people do) to a 3-month reading
and then you can justly feel good about yourself and join again in the
future or not.</p>
<p>In contrast a paper reading club has no obvious ending, unless you
pick all the papers in advance and organize them around a school year
or something. This has made running a paper reading club feel more
concerning to me. Though I greatly appreciate folks like Aleksey
Charapko and Murat Demirbas <a href="https://charap.co/category/reading-group/">who
do</a>.</p>
<h3 id="most-people-don't-actively-contribute,-but-they-still-value-it">Most people don't actively contribute, but they still value it</h3><p>In a group of 500 people, maybe 1-2% of those people actively
contribute to discussion. 5-10 people. But I often hear from people
who didn't participate that they still highly valued the group. And
this high percentage of non-active-participants is part of why I keep
allowing the group size to grow. There's little work I have to do and
a bunch of people benefit.</p>
<h3 id="doing-it-at-your-company-likely-won't-go-well">Doing it at your company likely won't go well</h3><p>I wrote about this
<a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">before</a>. For
some reason it's hard to get people who would otherwise join an
external reading club to join a company-internal reading club.</p>
<p>Though perhaps I'm just doing it wrong because I hear of others like
<a href="https://twitter.com/sqlliz/status/1745463496161325087">Elizabeth Garrett
Christensen</a>
who run an internal software book club successfully.</p>
<h3 id="good-luck,-have-fun!">Good luck, have fun!</h3><p>That's all I've got. Send me questions if you've got any. But mostly,
just give it a shot if you want to and you'll learn!</p>
<p>And if you still don't get it, you can of course just join one of my
book clubs. :)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Since folks have asked, here's how I run a software book club.<br><br>But also, you could just join and see. :)<a href="https://t.co/tXBrLFYbvC">https://t.co/tXBrLFYbvC</a> <a href="https://t.co/4iW8EfZCeY">pic.twitter.com/4iW8EfZCeY</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1796159854496600164?ref_src=twsrc%5Etfw">May 30, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-05-30-how-i-run-book-clubs.htmlThu, 30 May 2024 00:00:00 +0000
- Implementing MVCC and major SQL transaction isolation levelshttp://notes.eatonphil.com/2024-05-16-mvcc.html<p>In this post we'll build a database in 400 lines of code with basic
support for five standard SQL transaction levels: Read Uncommitted,
Read Committed, Repeatable Read, Snapshot Isolation and
Serializable. We'll use multi-version concurrency control (MVCC) and
optimistic concurrency control (OCC) to accomplish this. The goal
isn't to be perfect but to explain the basics in a minimal way.</p>
<p>You don't need to know what these terms mean in advance. I did not
understand them before doing this project. But if you've ever dealt
with SQL databases, transaction isolation levels are likely one of the
dark corners you either 1) weren't aware of or 2) wanted not to think
about. At least, this is how I felt.</p>
<p>While there are many blog posts that list out isolation levels, I
haven't been able to internalize their lessons. So I built this little
database to demonstrate the common isolation levels for myself. It
turned out to be simpler than I expected, and made the isolation
levels much easier to reason about.</p>
<p>Thank you to Justin Jaffray, Alex Miller, Sujay Jayakar, Peter
Veentjer, and Michael Gasch for providing feedback and suggestions.</p>
<p>All code is <a href="https://github.com/eatonphil/gomvcc">available</a> on
GitHub.</p>
<h3 id="why-do-we-need-transaction-isolation?">Why do we need transaction isolation?</h3><p>If you already know the answer, feel free to skip this section.</p>
<p>When I first started working with databases in CRUD applications, I
did not understand the point of transactions. I was fairly certain
that transactions are locks. I was wrong about that, but more on that
later.</p>
<p>I can't remember exact code I wrote, but here's something I could have
written:</p>
<div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">database</span><span class="o">.</span><span class="n">transaction</span><span class="p">()</span> <span class="k">as</span> <span class="n">t</span><span class="p">:</span>
<span class="n">users</span> <span class="o">=</span> <span class="n">t</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">"SELECT * FROM users WHERE group = 'admin';"</span><span class="p">)</span>
<span class="n">ids</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">users</span><span class="p">:</span>
<span class="k">if</span> <span class="n">some_complex_logic</span><span class="p">(</span><span class="n">user</span><span class="p">):</span>
<span class="n">ids</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="n">t</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">"UPDATE users SET metadata = 'some value' WHERE id IN ($1)';"</span><span class="p">,</span> <span class="n">ids</span><span class="p">)</span>
</pre></div>
<p>I would have thought that all users that were seen from the initial
<code>SELECT</code> who matched the <code>some_complex_logic</code> filter would be exactly
the same that are updated in my second SQL statement.</p>
<p>And if I were using SQLite, my guess would have been correct. But if I
were using MySQL or Postgres or Oracle or SQL Server, and hadn't made
any changes to defaults, that wouldn't necessarily be true! We'll
discover exactly why throughout this post.</p>
<p>For example, some other connection and transaction could have set a
<code>user</code>'s <code>group</code> to <code>admin</code> after the initial <code>SELECT</code> was
executed. It would then be missed from the <code>some_complex_logic</code> check
and from the subsequent <code>UPDATE</code>.</p>
<p>Or, again after our initial <code>SELECT</code>, some other connection could have
modified the <code>group</code> for some user that previously was <code>admin</code>. It
would then be incorrectly part of the second <code>UPDATE</code> statement.</p>
<p>These are just a few examples of what could go wrong.</p>
<p>This is the realm of transaction isolation. How do multiple
transactions running at the same time, interacting with the same data,
interact with each other?</p>
<p>The answer is: it depends. The SQL standard itself loosely prescribes
four isolation levels. But every database implements these four levels
slightly differently. Sometimes using entirely different
algorithms. And even among the standard levels, the default isolation
level for each database differs.</p>
<p>Funky bugs that can show up across databases and across isolation
levels, often dependent on particular details of common ways of
implementing isolation levels, create what are called
"anomalies". Examples include intimidating terms like "dirty reads"
and "write cycles" and G2-Item.</p>
<p>The topic is so complex that we've got decades of research papers
<a href="https://15721.courses.cs.cmu.edu/spring2019/papers/02-transactions/p1-berenson.pdf">critiquing</a>
SQL isolation levels,
<a href="https://pmg.csail.mit.edu/papers/icde00.pdf">categorization</a> of
common isolation anomalies, walkthroughs of anomalies by Martin
Kleppmann in <a href="https://dataintensive.net/">Designing Data-Intensive
Applications</a>, Martin Kleppman's
<a href="https://github.com/ept/hermitage">Hermitage</a> project documenting
common anomalies across isolation levels in major databases, and the
<a href="http://www.bailis.org/papers/acidrain-sigmod2017.pdf">ACIDRain paper</a>
showing isolation-related bugs in major open-source ecommerce
projects.</p>
<p>These aren't just random links. They're each quite interesting. And
particularly for practitioners who don't know why they should care,
check out Designing Data-Intensive Applications and the last link on
ACIDRain.</p>
<p>And this is only a small list of some of the most interesting research
and writing on the topic.</p>
<p>So there's a wide variety of things to consider:</p>
<ul>
<li>Not every database implements transaction isolation levels
identically, resulting in different behavior</li>
<li>Not all researchers agree, and not all database developers agree, on
what any given isolation level means</li>
<li>Not every database has the same default isolation level, and most
developers tend not to change the default</li>
<li>Not every developer is correctly using the isolation level they pick
(default or not)</li>
</ul>
<p>Transaction isolation levels are basically vibes. The only truth for
real projects is Martin Kleppmann's <a href="">Hermitage</a> project that
catalogs behavior across databases. And a truth some people align with
is <a href="https://pmg.csail.mit.edu/papers/icde00.pdf">Generalized Isolation Level
Definitions</a>.</p>
<p>So while all these linked works above are authoritative, and even
though we can see that there might be some anomalies we have to worry
about, the research can still be difficult to internalize. And many
developers, my recent self included, do not have a great understanding
of isolation levels.</p>
<p>Throughout this post we'll stick to informal definitions of isolation
levels to keep things simple.</p>
<p>Let's dig in.</p>
<h3 id="locks?-mvcc?">Locks? MVCC?</h3><p>Historically, databases implemented isolation with locking algorithms
such as <a href="https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s22/slides/13-two-phase-locking-annotated.pdf">Two-Phase
Locking</a>
(not the same thing as <a href="https://www.cs.princeton.edu/courses/archive/fall16/cos418/docs/L6-2pc.pdf">Two-Phase
Commit</a>). Multi-version
concurrency control (MVCC) is an approach that lets us completely
avoid locks.</p>
<p>It's worthwhile to note that while we will validly not use locks
(implementing what is called optimistic concurrency control or OCC),
most MVCC databases do still use locks for certain things
(implementing what is called pessimistic concurrency control).</p>
<p>But this is the story of databases in general. There are numerous ways
to implement things.</p>
<p>We will take the simpler lockless route.</p>
<p>Consider a key-value database. With MVCC, rather than storing only the
value for a key, we would store versions of the value. The version
includes the transaction id (a monotonic incrementing integer) wherein
the version was created, and the transaction id wherein the version
was deleted.</p>
<p>With MVCC, it is possible to express transaction isolation levels
almost solely as a set of different visibility rules for a version of
a value; rules that vary by isolation level.</p>
<p>So we will build up a general framework first and discuss and
implement each isolation level last.</p>
<h3 id="scaffolding">Scaffolding</h3><p>We'll build an in-memory key-value system that acts on transactions. I
usually try to stick with only the standard library for projects like
this but I really wanted a sorted data structure and Go doesn't
implement one.</p>
<p>In <code>main.go</code>, let's set up basic helpers for assertions and debugging:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"slices"</span>
<span class="w"> </span><span class="s">"github.com/tidwall/btree"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">assertEq</span><span class="p">[</span><span class="nx">C</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">a</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%s '%v' != '%v'"</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">var</span><span class="w"> </span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">slices</span><span class="p">.</span><span class="nx">Contains</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">,</span><span class="w"> </span><span class="s">"--debug"</span><span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">append</span><span class="p">([]</span><span class="kt">any</span><span class="p">{</span><span class="s">"[DEBUG]"</span><span class="p">},</span><span class="w"> </span><span class="nx">a</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">args</span><span class="o">...</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>As mentioned previously, a value in the database will be defined with
start and end transaction ids.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Value</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">txStartId</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">txEndId</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
</pre></div>
<p>Every transaction will be in an in-progress, aborted, or committed
state.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">TransactionState</span><span class="w"> </span><span class="kt">uint8</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">InProgressTransaction</span><span class="w"> </span><span class="nx">TransactionState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">AbortedTransaction</span>
<span class="w"> </span><span class="nx">CommittedTransaction</span>
<span class="p">)</span>
</pre></div>
<p>And we'll support a few major isolation levels.</p>
<div class="highlight"><pre><span></span><span class="c1">// Loosest isolation at the top, strictest isolation at the bottom.</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">IsolationLevel</span><span class="w"> </span><span class="kt">uint8</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">ReadUncommittedIsolation</span><span class="w"> </span><span class="nx">IsolationLevel</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">ReadCommittedIsolation</span>
<span class="w"> </span><span class="nx">RepeatableReadIsolation</span>
<span class="w"> </span><span class="nx">SnapshotIsolation</span>
<span class="w"> </span><span class="nx">SerializableIsolation</span>
<span class="p">)</span>
</pre></div>
<p>We'll get into detail about the meaning of the levels later.</p>
<p>A transaction has an isolation level, an id (monotonic increasing
integer), and a current state. And although we won't make use of this
data yet, transactions at stricter isolation levels will need some
extra info. Specifically, stricter isolation levels need to know about
other transactions that were in-progress when this one started. And
stricter isolation levels need to know about all keys read and written
by a transaction.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Transaction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">isolation</span><span class="w"> </span><span class="nx">IsolationLevel</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="nx">TransactionState</span>
<span class="w"> </span><span class="c1">// Used only by Repeatable Read and stricter.</span>
<span class="w"> </span><span class="nx">inprogress</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">uint64</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// Used only by Snapshot Isolation and stricter.</span>
<span class="w"> </span><span class="nx">writeset</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span>
<span class="w"> </span><span class="nx">readset</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span>
<span class="p">}</span>
</pre></div>
<p>We'll discuss why later.</p>
<p>Finally, the database itself will have a default isolation level that
each transaction will inherit (for our own convenience in tests).</p>
<p>The database will have a mapping of keys to an array of value
versions. Later elements in the array will represent newer versions of
a value.</p>
<p>The database will also store the next free transaction id it will use
to assign ids to new transactions.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Database</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="nx">IsolationLevel</span>
<span class="w"> </span><span class="nx">store</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Value</span>
<span class="w"> </span><span class="nx">transactions</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Map</span><span class="p">[</span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="nx">Transaction</span><span class="p">]</span>
<span class="w"> </span><span class="nx">nextTransactionId</span><span class="w"> </span><span class="kt">uint64</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span><span class="w"> </span><span class="nx">Database</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">Database</span><span class="p">{</span>
<span class="w"> </span><span class="nx">defaultIsolation</span><span class="p">:</span><span class="w"> </span><span class="nx">ReadCommittedIsolation</span><span class="p">,</span>
<span class="w"> </span><span class="nx">store</span><span class="p">:</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Value</span><span class="p">{},</span>
<span class="w"> </span><span class="c1">// The `0` transaction id will be used to mean that</span>
<span class="w"> </span><span class="c1">// the id was not set. So all valid transaction ids</span>
<span class="w"> </span><span class="c1">// must start at 1.</span>
<span class="w"> </span><span class="nx">nextTransactionId</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p class="note">
To be thread-safe: <code>store</code>, <code>transactions</code>,
and <code>nextTransactionId</code> should be guarded by a mutex. But
to keep the code small, this post will not use goroutines and thus
does not need mutexes.
</p><p>There's a bit of book-keeping when creating a transaction, so we'll
make a dedicated method for this. We must give the new transaction an
id, store all in-progress transactions, and add it to database
transaction history.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">inprogress</span><span class="p">()</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">uint64</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">uint64</span><span class="p">]</span>
<span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Iter</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">ok</span><span class="p">;</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">().</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">InProgressTransaction</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ids</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ids</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">newTransaction</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">Transaction</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">isolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">defaultIsolation</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">InProgressTransaction</span>
<span class="w"> </span><span class="c1">// Assign and increment transaction id.</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">nextTransactionId</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">nextTransactionId</span><span class="o">++</span>
<span class="w"> </span><span class="c1">// Store all inprogress transaction ids.</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">inprogress</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">inprogress</span><span class="p">()</span>
<span class="w"> </span><span class="c1">// Add this transaction to history.</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Set</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"starting transaction"</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">t</span>
<span class="p">}</span>
</pre></div>
<p>And we'll add a few more helpers for completing a transaction, for
fetching a transaction by id, and for validating a transaction.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">completeTransaction</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="p">,</span><span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="nx">TransactionState</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">"completing transaction "</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Update transactions.</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">state</span>
<span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Set</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">transactionState</span><span class="p">(</span><span class="nx">txId</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="nx">Transaction</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Get</span><span class="p">(</span><span class="nx">txId</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">ok</span><span class="p">,</span><span class="w"> </span><span class="s">"valid transaction"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">assertValidTransaction</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="s">"valid id"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">transactionState</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">).</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">InProgressTransaction</span><span class="p">,</span><span class="w"> </span><span class="s">"in progress"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>The final bit of scaffolding we'll set up is an abstraction for
database connections. A connection will have at most associated one
transaction. Users must ask the database for a new connection. Then
within the connection they can manage a transaction.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Connection</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">Connection</span><span class="p">)</span><span class="w"> </span><span class="nx">execCommand</span><span class="p">(</span><span class="nx">command</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">command</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"unimplemented"</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">Connection</span><span class="p">)</span><span class="w"> </span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">"unexpected error"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">res</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">newConnection</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">Connection</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Connection</span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="p">:</span><span class="w"> </span><span class="nx">d</span><span class="p">,</span>
<span class="w"> </span><span class="nx">tx</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"unimplemented"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for scaffolding. Now set up the go module and make sure
this builds.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>gomvcc
<span class="go">go: creating new go.mod: module gomvcc</span>
<span class="go">go: to add module requirements and sums:</span>
<span class="go"> go mod tidy</span>
<span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
<span class="go">go: finding module for package github.com/tidwall/btree</span>
<span class="go">go: found github.com/tidwall/btree in github.com/tidwall/btree v1.7.0</span>
<span class="gp">$ </span>go<span class="w"> </span>build
<span class="gp">$ </span>./gomvcc
<span class="go">panic: unimplemented</span>
<span class="go">goroutine 1 [running]:</span>
<span class="go">main.main()</span>
<span class="go"> /Users/phil/tmp/main.go:166 +0x2c</span>
</pre></div>
<p>Great!</p>
<h3 id="transaction-management">Transaction management</h3><p>When the user asks to begin a transaction, we ask the database for a
new transaction and assign it to the current connection.</p>
<div class="highlight"><pre><span></span><span class="w"> </span>func (c *Connection) execCommand(command string, args []string) (string, error) {
<span class="w"> </span> debug(command, args)
<span class="gi">+ if command == "begin" {</span>
<span class="gi">+ assertEq(c.tx, nil, "no running transactions")</span>
<span class="gi">+ c.tx = c.db.newTransaction()</span>
<span class="gi">+ c.db.assertValidTransaction(c.tx)</span>
<span class="gi">+ return fmt.Sprintf("%d", c.tx.id), nil</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> // TODO
<span class="w"> </span> return "", fmt.Errorf("unimplemented")
<span class="w"> </span>}
</pre></div>
<p>To abort a transaction, we call the <code>completeTransaction</code> method
(which makes sure the database transaction history gets updated) with
the <code>AbortedTransaction</code> state.</p>
<div class="highlight"><pre><span></span><span class="w"> </span> return fmt.Sprintf("%d", c.tx.id), nil
<span class="w"> </span> }
<span class="gi">+ if command == "abort" {</span>
<span class="gi">+ c.db.assertValidTransaction(c.tx)</span>
<span class="gi">+ err := c.db.completeTransaction(c.tx, AbortedTransaction)</span>
<span class="gi">+ c.tx = nil</span>
<span class="gi">+ return "", err</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> // TODO
<span class="w"> </span> return "", fmt.Errorf("unimplemented")
<span class="w"> </span>}
</pre></div>
<p>And to commit a transaction is similar.</p>
<div class="highlight"><pre><span></span><span class="w"> </span> return "", err
<span class="w"> </span> }
<span class="gi">+ if command == "commit" {</span>
<span class="gi">+ c.db.assertValidTransaction(c.tx)</span>
<span class="gi">+ err := c.db.completeTransaction(c.tx, CommittedTransaction)</span>
<span class="gi">+ c.tx = nil</span>
<span class="gi">+ return "", err</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> // TODO
<span class="w"> </span> return "", fmt.Errorf("unimplemented")
<span class="w"> </span>}
</pre></div>
<p>The neat thing about MVCC is that beginning, committing, and aborting
a transaction is metadata work. Committing a transaction will get a
bit more complex when we add support for Snapshot Isolation and
Serializable Isolation, but we'll get to that later. Even then, it
will not involve modifying any values we get, set, or delete.</p>
<h3 id="get,-set,-delete">Get, set, delete</h3><p>Here is where things get fun. As mentioned earlier, the key-value
store is actually <code>map[string][]Value</code>. With the more recent versions
of a value at the end of the list of values for the key.</p>
<p>For <code>get</code> support, we'll iterate the list of value versions backwards
for the key. And we'll call a special new <code>isvisible</code> method to
determine if this transaction can see this value. The first value that
passes the <code>isvisible</code> test is the correct value for the transaction.</p>
<div class="highlight"><pre><span></span><span class="w"> </span> return "", err
<span class="w"> </span> }
<span class="gi">+ if command == "get" {</span>
<span class="gi">+ c.db.assertValidTransaction(c.tx)</span>
<span class="gi">+</span>
<span class="gi">+ key := args[0]</span>
<span class="gi">+</span>
<span class="gi">+ c.tx.readset.Insert(key)</span>
<span class="gi">+</span>
<span class="gi">+ for i := len(c.db.store[key]) - 1; i >= 0; i-- {</span>
<span class="gi">+ value := c.db.store[key][i]</span>
<span class="gi">+ debug(value, c.tx, c.db.isvisible(c.tx, value))</span>
<span class="gi">+ if c.db.isvisible(c.tx, value) {</span>
<span class="gi">+ return value.value, nil</span>
<span class="gi">+ }</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ return "", fmt.Errorf("cannot get key that does not exist")</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> // TODO
<span class="w"> </span> return "", fmt.Errorf("unimplemented")
<span class="w"> </span>}
</pre></div>
<p>I snuck in tracking which keys are read, and we'll also soon sneak in
tracking which keys are written. This is necessary in stricter
isolation levels. More on that later.</p>
<p><code>set</code> and <code>delete</code> are similar to get. But this time when we walk the
list of value versions, we will set the <code>txEndId</code> for the value to the
current transaction id if the value version is visible to this
transaction.</p>
<p>Then, for <code>set</code>, we'll append to the value version list with the new
version of the value that starts at this current transaction.</p>
<div class="highlight"><pre><span></span><span class="w"> </span> return "", err
<span class="w"> </span> }
<span class="gi">+ if command == "set" || command == "delete" {</span>
<span class="gi">+ c.db.assertValidTransaction(c.tx)</span>
<span class="gi">+</span>
<span class="gi">+ key := args[0]</span>
<span class="gi">+</span>
<span class="gi">+ // Mark all visible versions as now invalid.</span>
<span class="gi">+ found := false</span>
<span class="gi">+ for i := len(c.db.store[key]) - 1; i >= 0; i-- {</span>
<span class="gi">+ value := &c.db.store[key][i]</span>
<span class="gi">+ debug(value, c.tx, c.db.isvisible(c.tx, *value))</span>
<span class="gi">+ if c.db.isvisible(c.tx, *value) {</span>
<span class="gi">+ value.txEndId = c.tx.id</span>
<span class="gi">+ found = true</span>
<span class="gi">+ }</span>
<span class="gi">+ }</span>
<span class="gi">+ if command == "delete" && !found {</span>
<span class="gi">+ return "", fmt.Errorf("cannot delete key that does not exist")</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ c.tx.writeset.Insert(key)</span>
<span class="gi">+</span>
<span class="gi">+ // And add a new version if it's a set command.</span>
<span class="gi">+ if command == "set" {</span>
<span class="gi">+ value := args[1]</span>
<span class="gi">+ c.db.store[key] = append(c.db.store[key], Value{</span>
<span class="gi">+ txStartId: c.tx.id,</span>
<span class="gi">+ txEndId: 0,</span>
<span class="gi">+ value: value,</span>
<span class="gi">+ })</span>
<span class="gi">+</span>
<span class="gi">+ return value, nil</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // Delete ok.</span>
<span class="gi">+ return "", nil</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> if command == "get" {
<span class="w"> </span> c.db.assertValidTransaction(c.tx)
</pre></div>
<p>This time rather than modifying the <code>readset</code> we modify the <code>writeset</code>
for the transaction.</p>
<p>And that is how commands get executed!</p>
<p>Let's zoom in to the core of the problem we have mentioned but not
implemented: MVCC visibility rules and how they differ by isolation
levels.</p>
<h3 id="isolation-levels-and-mvcc-visibility-rules">Isolation levels and MVCC visibility rules</h3><p>To varying degrees, transaction isolation levels prevent concurrent
transactions from messing with each other. The looser isolation levels
prevent this almost not at all.</p>
<p>Here is what the <a href="https://web.cecs.pdx.edu/~len/sql1999.pdf">1999 ANSI SQL
standard</a> (page 84) has to
say.</p>
<p><img src="/sql99isolation.png" alt="/sql99isolation.png"></p>
<p>But as I mentioned in the beginning of the post, we're going to be a
bit informal. And we'll mostly refer to
<a href="https://jepsen.io/consistency">Jepsen</a> summaries of each isolation
levels.</p>
<h4 id="read-uncommitted">Read Uncommitted</h4><p>According to
<a href="https://jepsen.io/consistency/models/read-uncommitted">Jepsen</a>, the
loosest isolation level, Read Uncommitted, has almost no
restrictions. We can merely read the most recent (non-deleted) version
of a value, regardless of if the transaction that set it has committed
or aborted or not.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">isvisible</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Read Uncommitted means we simply read the last value</span>
<span class="w"> </span><span class="c1">// written. Even if the transaction that wrote this value has</span>
<span class="w"> </span><span class="c1">// not committed, and even if it has aborted.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">isolation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ReadUncommittedIsolation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// We must merely make sure the value has not been</span>
<span class="w"> </span><span class="c1">// deleted.</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">txEndId</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="s">"unsupported isolation level"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<p>Let's write a test that demonstrates this. We create two transactions,
<code>c1</code> and <code>c2</code>, and set a key in <code>c1</code>. The value set for the key in
<code>c1</code> should be immediately visible if <code>c2</code> asks for that key. In
main_test.go:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"testing"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">TestReadUncommitted</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span>
<span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ReadUncommittedIsolation</span>
<span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">})</span>
<span class="w"> </span><span class="c1">// Update is visible to self.</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c1 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But since read uncommitted, also available to everyone else.</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// And if we delete, that should be respected.</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"delete"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c1 delete x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c1 sees no x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c1 sees no x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 sees no x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 sees no x"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p class="note">
Thank you to @glaebhoerl
for <a href="https://twitter.com/glaebhoerl/status/1792912649304388043">pointing
out</a> that in an earlier version of this post, Read Uncommitted
incorrectly made deleted values visible.
</p><p>That's pretty simple! But also pretty useless if your workload has
conflicts. If you can arrange your workload in a way where you know no
concurrent transactions will ever read or write conflicting keys
though, this could be pretty efficient! The rules will only get more
complex (and thus potentially more of a bottleneck) from here on.</p>
<p>But for the most part, people don't use this isolation level. SQLite,
Yugabyte, Cockroach, and Postgres <a href="https://github.com/ept/hermitage?tab=readme-ov-file#summary-of-test-results">don't
even</a>
implement it. It is also not the default for any major database that
does implement it.</p>
<p>Let's get a little stricter.</p>
<h4 id="read-committed">Read Committed</h4><p>We'll pull again from <a href="https://jepsen.io/consistency/models/read-committed">Jepsen</a>:</p>
<blockquote><p>Read committed is a consistency model which strengthens read
uncommitted by preventing dirty reads: transactions are not allowed
to observe writes from transactions which do not commit.</p>
</blockquote>
<p>This sounds pretty simple. In <code>isvisible</code> we'll make sure that the
value has a <code>txStartId</code> that is either this transaction or a
transaction that has committed. Moreover we will now begin checking
against <code>txEndId</code> to make sure the value wasn't deleted by any
relevant transaction.</p>
<div class="highlight"><pre><span></span><span class="w"> </span> return value.txEndId == 0
<span class="w"> </span> }
<span class="gi">+ // Read Committed means we are allowed to read any values that</span>
<span class="gi">+ // are committed at the point in time where we read.</span>
<span class="gi">+ if t.isolation == ReadCommittedIsolation {</span>
<span class="gi">+ // If the value was created by a transaction that is</span>
<span class="gi">+ // not committed, and not this current transaction,</span>
<span class="gi">+ // it's no good.</span>
<span class="gi">+ if value.txStartId != t.id &&</span>
<span class="gi">+ d.transactionState(value.txStartId).state != CommittedTransaction {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // If the value was deleted in this transaction, it's no good.</span>
<span class="gi">+ if value.txEndId == t.id {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // Or if the value was deleted in some other committed</span>
<span class="gi">+ // transaction, it's no good.</span>
<span class="gi">+ if value.txEndId > 0 &&</span>
<span class="gi">+ d.transactionState(value.txEndId).state == CommittedTransaction {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // Otherwise the value is good.</span>
<span class="gi">+ return true</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> assert(false, "unsupported isolation level")
<span class="w"> </span> return false
<span class="w"> </span>}
</pre></div>
<p>This begins to look useful! We will never read a value that isn't part
of a committed transaction (or isn't part of our own
transaction). Indeed this is the
<a href="https://github.com/ept/hermitage">default</a> isolation level for many
databases including Postgres, Yugabyte, Oracle, and SQL Server.</p>
<p>Let's add a test to <code>main_test.go</code>. This is a bit long, but give it a
slow read. It is thoroughly commented.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestReadCommitted</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span>
<span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ReadCommittedIsolation</span>
<span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Local change is visible locally.</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c1 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Update not available to this transaction since this is not</span>
<span class="w"> </span><span class="c1">// committed.</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Now that it's been committed, it's visible in c2.</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Local change is visible locally.</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"yall"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"yall"</span><span class="p">,</span><span class="w"> </span><span class="s">"c3 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But not on the other commit, again.</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"abort"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// And still not, if the other transaction aborted.</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// And if we delete it, it should show up deleted locally.</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"delete"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// It should also show up as deleted in new transactions now</span>
<span class="w"> </span><span class="c1">// that it has been committed.</span>
<span class="w"> </span><span class="nx">c4</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c4 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c4 get x"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Again this seems great. However! You can easily get inconsistent data
within a transaction at this isolation level. If the transaction A has
multiple statements it can see different results per statement, even
if the transaction A did not modify data. Another transaction B may
have committed changes between two statements in this transaction A.</p>
<p>Let's get a little stricter.</p>
<h4 id="repeatable-read">Repeatable Read</h4><p>Again as Jepsen says, Repeatable Read is the same as Read Committed
but with the following anomaly not allowed (quoting from the <a href="https://web.cecs.pdx.edu/~len/sql1999.pdf">ANSI SQL
1999 standard</a>):</p>
<blockquote><p>P2 (“Non-repeatable read”): SQL-transaction T1 reads a
row. SQL-transaction T2 then modifies or deletes that row and
performs a COMMIT. If T1 then attempts to reread the row, it may
receive the modified value or discover that the row has been
deleted.</p>
</blockquote>
<p>To support this, we will add additional checks for the Read Committed
logic that make sure the value was not created and not deleted within
a transaction that started before this transaction started.</p>
<p>As it happens, this is the same logic that will be necessary for
Snapshot Isolation and Serializable Isolation. The additional logic
(that makes Snapshot Isolation and Serializable Isolation different)
happens at commit time.</p>
<div class="highlight"><pre><span></span><span class="w"> </span> return true
<span class="w"> </span> }
<span class="gd">- assert(false, "unsupported isolation level")</span>
<span class="gd">- return false</span>
<span class="gi">+ // Repeatable Read, Snapshot Isolation, and Serializable</span>
<span class="gi">+ // further restricts Read Committed so only versions from</span>
<span class="gi">+ // transactions that completed before this one started are</span>
<span class="gi">+ // visible.</span>
<span class="gi">+</span>
<span class="gi">+ // Snapshot Isolation and Serializable will do additional</span>
<span class="gi">+ // checks at commit time.</span>
<span class="gi">+ assert(t.isolation == RepeatableReadIsolation ||</span>
<span class="gi">+ t.isolation == SnapshotIsolation ||</span>
<span class="gi">+ t.isolation == SerializableIsolation, "invalid isolation level")</span>
<span class="gi">+ // Ignore values from transactions started after this one.</span>
<span class="gi">+ if value.txStartId > t.id {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // Ignore values created from transactions in progress when</span>
<span class="gi">+ // this one started.</span>
<span class="gi">+ if t.inprogress.Contains(value.txStartId) {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // If the value was created by a transaction that is not</span>
<span class="gi">+ // committed, and not this current transaction, it's no good.</span>
<span class="gi">+ if d.transactionState(value.txStartId).state != CommittedTransaction &&</span>
<span class="gi">+ value.txStartId != t.id {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // If the value was deleted in this transaction, it's no good.</span>
<span class="gi">+ if value.txEndId == t.id {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // Or if the value was deleted in some other committed</span>
<span class="gi">+ // transaction that started before this one, it's no good.</span>
<span class="gi">+ if value.txEndId < t.id &&</span>
<span class="gi">+ value.txEndId > 0 &&</span>
<span class="gi">+ d.transactionState(value.txEndId).state == CommittedTransaction &&</span>
<span class="gi">+ !t.inprogress.Contains(value.txEndId) {</span>
<span class="gi">+ return false</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ return true</span>
<span class="w"> </span>}
<span class="w"> </span>type Connection struct {
</pre></div>
<p>How do I derive these rules? Mostly by writing tests that should pass
or fail and seeing what doesn't make sense. I tried to steal from
existing projects but these rules were not so simple to
discover. Which is part of what I hope makes this project particularly
useful to look at.</p>
<p>Let's write a test for Repeatable Read. Again, the test is long but
well commented.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestRepeatableRead</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span>
<span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">RepeatableReadIsolation</span>
<span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Local change is visible locally.</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c1 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Update not available to this transaction since this is not</span>
<span class="w"> </span><span class="c1">// committed.</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Even after committing, it's not visible in an existing</span>
<span class="w"> </span><span class="c1">// transaction.</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But is available in a new transaction.</span>
<span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c3 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Local change is visible locally.</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"yall"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"yall"</span><span class="p">,</span><span class="w"> </span><span class="s">"c3 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But not on the other commit, again.</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"abort"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// And still not, regardless of abort, because it's an older</span>
<span class="w"> </span><span class="c1">// transaction.</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// And again still the aborted set is still not on a new</span>
<span class="w"> </span><span class="c1">// transaction.</span>
<span class="w"> </span><span class="nx">c4</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">,</span><span class="w"> </span><span class="s">"c4 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"delete"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But the delete is visible to new transactions now that this</span>
<span class="w"> </span><span class="c1">// has been committed.</span>
<span class="w"> </span><span class="nx">c5</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c5</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c5</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c5 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c5 get x"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Let's get stricter!</p>
<h4 id="snapshot-isolation">Snapshot Isolation</h4><p>Back to
[Jepsen](<a href="https://jepsen.io/consistency/models/snapshot-isolation">https://jepsen.io/consistency/models/snapshot-isolation</a> for a
definition:</p>
<blockquote><p>In a snapshot isolated system, each transaction appears to operate
on an independent, consistent snapshot of the database. Its changes
are visible only to that transaction until commit time, when all
changes become visible atomically to any transaction which begins at
a later time. If transaction T1 has modified an object x, and
another transaction T2 committed a write to x after T1’s snapshot
began, and before T1’s commit, then T1 must abort.</p>
</blockquote>
<p>So Snapshot Isolation is the same as Repeatable Read but with one
additional rule: the keys written by any two concurrent committed
transactions must not overlap.</p>
<p>This is why we tracked <code>writeset</code>. Every time a transaction modified
or deleted a key, we added it to the transaction's <code>writeset</code>. To make
sure we abort correctly, we'll add a conflict check to the commit
step. (This idea is also well documented in <a href="https://dl.acm.org/doi/abs/10.1145/2168836.2168853">A critique of snapshot
isolation</a>. This
paper can be hard to find. Email me if you want a copy.)</p>
<p>When a transaction A goes to commit, it will run a conflict test for
any transaction B that has committed since this transaction A started.</p>
<p>Serializable Isolation is going to have a similar check. So we'll add
a helper for iterating through all relevant transactions, running a
check function for any transaction that has committed.</p>
<div class="highlight"><pre><span></span>func (d *Database) hasConflict(t1 *Transaction, conflictFn func(*Transaction, *Transaction) bool) bool {
<span class="w"> </span> iter := d.transactions.Iter()
<span class="w"> </span> // First see if there is any conflict with transactions that
<span class="w"> </span> // were in progress when this one started.
<span class="w"> </span> inprogressIter := t1.inprogress.Iter()
<span class="w"> </span> for ok := inprogressIter.First(); ok; ok = inprogressIter.Next() {
<span class="w"> </span> id := inprogressIter.Key()
<span class="w"> </span> found := iter.Seek(id)
<span class="w"> </span> if !found {
<span class="w"> </span> continue
<span class="w"> </span> }
<span class="w"> </span> t2 := iter.Value()
<span class="w"> </span> if t2.state == CommittedTransaction {
<span class="w"> </span> if conflictFn(t1, &t2) {
<span class="w"> </span> return true
<span class="w"> </span> }
<span class="w"> </span> }
<span class="w"> </span> }
<span class="w"> </span> // Then see if there is any conflict with transactions that
<span class="w"> </span> // started and committed after this one started.
<span class="w"> </span> for id := t1.id; id < d.nextTransactionId; id++ {
<span class="w"> </span> found := iter.Seek(id)
<span class="w"> </span> if !found {
<span class="w"> </span> continue
<span class="w"> </span> }
<span class="w"> </span> t2 := iter.Value()
<span class="w"> </span> if t2.state == CommittedTransaction {
<span class="w"> </span> if conflictFn(t1, &t2) {
<span class="w"> </span> return true
<span class="w"> </span> }
<span class="w"> </span> }
<span class="w"> </span> }
<span class="w"> </span> return false
}
</pre></div>
<p>It was around this point that I decided I did really need a B-Tree
implementation and could not just stick to vanilla Go data structures.</p>
<p>Now we can modify <code>completeTransaction</code> to do this check if the
transaction intends to commit. If the current transaction A's write
set intersects with any other transaction B committed since
transaction A started, we must abort.</p>
<div class="highlight"><pre><span></span><span class="w"> </span>func (d *Database) completeTransaction(t *Transaction, state TransactionState) error {
<span class="w"> </span> debug("completing transaction ", t.id)
<span class="gi">+</span>
<span class="gi">+ if state == CommittedTransaction {</span>
<span class="gi">+ // Snapshot Isolation imposes the additional constraint that</span>
<span class="gi">+ // no transaction A may commit after writing any of the same</span>
<span class="gi">+ // keys as transaction B has written and committed during</span>
<span class="gi">+ // transaction A's life.</span>
<span class="gi">+ if t.isolation == SnapshotIsolation && d.hasConflict(t, func(t1 *Transaction, t2 *Transaction) bool {</span>
<span class="gi">+ return setsShareItem(t1.writeset, t2.writeset)</span>
<span class="gi">+ }) {</span>
<span class="gi">+ d.completeTransaction(t, AbortedTransaction)</span>
<span class="gi">+ return fmt.Errorf("write-write conflict")</span>
<span class="gi">+ }</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> // Update transactions.
<span class="w"> </span> t.state = state
<span class="w"> </span> d.transactions.Set(t.id, *t)
</pre></div>
<p>Lastly, the definition of <code>setsShareItem</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">setsShareItem</span><span class="p">(</span><span class="nx">s1</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">],</span><span class="w"> </span><span class="nx">s2</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">])</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s1Iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1</span><span class="p">.</span><span class="nx">Iter</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s2Iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s2</span><span class="p">.</span><span class="nx">Iter</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1Iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">ok</span><span class="p">;</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s1Iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s1Key</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1Iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()</span>
<span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s2Iter</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nx">s1Key</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<p>Since Snapshot Isolation shares all the same visibility rules as
Repeatable Read, the tests get to be a little simpler! We'll simply
test that two transactions attempting to commit a write to the same
key fail. Or specifically: that the second transaction cannot commit.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestSnapshotIsolation_writewrite_conflict</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span>
<span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">SnapshotIsolation</span>
<span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 commit"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"write-write conflict"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 commit"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But unrelated keys cause no conflict.</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"y"</span><span class="p">,</span><span class="w"> </span><span class="s">"no conflict"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Not bad! But let's get stricter.</p>
<p class="note note--edit">
Upon further discussion with Alex Miller, and after reviewing <a
href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf">A
Critique of ANSI SQL Isolation Levels</a>, the difference I am
trying to suggest (between Repeatable Read an Snapshot Isolation)
likely does not exist. A Critique of ANSI SQL Isolation Levels
mentions Repeatable Read must not exhibit P4 (Lost Update)
anomalies. And it mentions that you must check for read-write
conflicts to avoid these. Therefore it seems likely that you can't
easily separate Repeatable Read from Snapshot Isolation when
implemented using MVCC. The differences between Repeatable Read and
Snapshot Isolation may more readily show up when implementing
transactions the classical way with Two-Phase Locking.
<br />
<br />
To reiterate, with MVCC and optimistic concurrency control, correct
implementations of Repeatable Read and Snapshot Isolation do not
seem to be distinguishable. Both require write-write conflict
detection.
</p><h4 id="serializable-isolation">Serializable Isolation</h4><p>In terms of end-result, this is the simplest isolation level to reason
about. Serializable Isolation must appear as if only a single
transaction were executing at a time. Some systems, like SQLite and
TigerBeetle, do Actually Serial Execution where only one transaction
runs at a time. But few databases implement Serializable like this
because it removes a number of fair concurrent execution
histories. For example, two concurrent read-only transactions.</p>
<p>Postgres implements serializability via <a href="https://drkp.net/papers/ssi-vldb12.pdf">Serializable Snapshot
Isolation</a>. MySQL implements
serializability via <a href="https://distributed-computing-musings.com/2022/02/transactions-two-phase-locking/">Two-Phase
Locking</a>. FoundationDB
implements serializability via <a href="https://apple.github.io/foundationdb/developer-guide.html">sequential timestamp assignment and
conflict
detection</a>.</p>
<p>But the paper, <a href="https://dl.acm.org/doi/abs/10.1145/2168836.2168853">A critique of snapshot
isolation</a>,
provides a simple (though not necessarily efficient; I have no clue)
approach via what they call Write Snapshot Isolation. In their
algorithm, if any two transactions read and write set intersect (but
not write and write set intersect), the transaction should be
aborted. And this (plus Repeatable Read rules) is sufficient for
Serializability.</p>
<p>I leave it to that paper for the proof of correctness. In terms of
implementing it though it's quite simple and very similar to the
Snapshot Isolation we already mentioned.</p>
<p>Inside of <code>completeTransaction</code> add:</p>
<div class="highlight"><pre><span></span><span class="w"> </span> }) {
<span class="w"> </span> d.completeTransaction(t, AbortedTransaction)
<span class="w"> </span> return fmt.Errorf("write-write conflict")
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ // Serializable Isolation imposes the additional constraint that</span>
<span class="gi">+ // no transaction A may commit after reading any of the same</span>
<span class="gi">+ // keys as transaction B has written and committed during</span>
<span class="gi">+ // transaction A's life, or vice-versa.</span>
<span class="gi">+ if t.isolation == SerializableIsolation && d.hasConflict(t, func(t1 *Transaction, t2 *Transaction) bool {</span>
<span class="gi">+ return setsShareItem(t1.readset, t2.writeset) ||</span>
<span class="gi">+ setsShareItem(t1.writeset, t2.readset)</span>
<span class="gi">+ }) {</span>
<span class="gi">+ d.completeTransaction(t, AbortedTransaction)</span>
<span class="gi">+ return fmt.Errorf("read-write conflict")</span>
<span class="w"> </span> }
<span class="w"> </span> }
</pre></div>
<p>And if we add a test for read-write conflicts:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestSerializableIsolation_readwrite_conflict</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span>
<span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">SerializableIsolation</span>
<span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"begin"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">,</span><span class="w"> </span><span class="s">"hey"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"get"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"x"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"cannot get key that does not exist"</span><span class="p">,</span><span class="w"> </span><span class="s">"c5 get x"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 commit"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">"read-write conflict"</span><span class="p">,</span><span class="w"> </span><span class="s">"c2 commit"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// But unrelated keys cause no conflict.</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"set"</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">"y"</span><span class="p">,</span><span class="w"> </span><span class="s">"no conflict"</span><span class="p">})</span>
<span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">"commit"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>We see it work! And that's it for a basic implementation of MVCC and
major transaction isolation levels.</p>
<h3 id="production-quality-testing">Production-quality testing</h3><p>There are two major projects I'm aware of that help you test
transaction implementations: <a href="https://github.com/jepsen-io/elle">Elle</a>
and <a href="https://github.com/ept/hermitage">Hermitage</a>. These are probably
where I'd go looking if I were implementing this for real.</p>
<p>This project took me long enough on its own and I felt reasonably
comfortable with my tests that the gist of my logic was right that I
did not test further. For that reason it surely has bugs.</p>
<h3 id="vacuuming-and-cleanup">Vacuuming and cleanup</h3><p>One of the major things this implementation does not do is cleaning up
old data. Eventually, older versions of values will be required by no
transactions. They should be removed from the value version
array. Similarly, eventually older transactions will be required by no
transactions. They should be removed from the database transaction
history list.</p>
<p>Even if we had the vacuuming process in place though, what about some
extreme use patterns. What if a key's value was always going to be 1GB
long. And what if multiple transactions made only small changes to the
1GB data. We'd be duplicating a lot of the value across versions.</p>
<p>It sounds less extreme when thinking about storing rows of data rather
than key-value data. If a user has 100 columns and only updates one
column a number of times, in our scheme we'd end up storing a ton of
duplicate cell data for a row.</p>
<p>This is a real-world issue in Postgres that was <a href="https://ottertune.com/blog/the-part-of-postgresql-we-hate-the-most">called
out</a>
by Andy Pavlo and the Ottertune folks. It turns out that Postgres
alone among major databases stores the entire value for every
version. In contrast other major databases like MySQL store a diff.</p>
<h3 id="conclusion">Conclusion</h3><p>This post only begins to demonstrate that database behavior differs
quite a bit both in terms of results and in terms of
optimizations. Everyone implements the ideas differently and to
varying degrees.</p>
<p>Moreover, we have only begun to implement the behavior a real SQL
database supports. For example, how do visibility rules and conflict
detection work with range queries? What about sub-transactions, and
save points? These will have to be covered another time.</p>
<p>Hopefully seeing this simple implementation of MVCC and visibility rules
helps to clarify at least some of the basics.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here's a new post walking through an implementation of MVCC and major SQL transaction isolation levels, in 400 lines of Go code.<br><br>These ideas might sound esoteric, but they impact almost every developer using any database.<a href="https://t.co/crFKM74R5h">https://t.co/crFKM74R5h</a> <a href="https://t.co/o9awTPpvvx">pic.twitter.com/o9awTPpvvx</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1791225675287867742?ref_src=twsrc%5Etfw">May 16, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-05-16-mvcc.htmlThu, 16 May 2024 00:00:00 +0000
- What makes a great technical bloghttp://notes.eatonphil.com/2024-04-10-what-makes-a-great-tech-blog.html<p>I want to explain why the blogs in <a href="https://lists.eatonphil.com/blogs.html">My favorite technical
blogs</a> are my favorite. That
page is solely about non-corporate tech blogs. So this post is
too. I'll have to make another list for favorite corporate tech blogs.</p>
<p>In short, they:</p>
<ul>
<li>Tackle hard and confusing topics</li>
<li>Show working code</li>
<li>Make things simpler</li>
<li>Write regularly</li>
<li>Talk about tradeoffs and downsides</li>
<li>Avoid internet slang, memes, swearing, sarcasm, and ranting</li>
</ul>
<h3 id="tackle-hard-and-confusing-topics">Tackle hard and confusing topics</h3><p>There are a number of problems in programming and computer science
where otherwise knowledgeable programmers have to start mumbling
about, or revert to cliches or group-think, because they aren't
sure.</p>
<p>These are the best topics you can possibly dive deep into. And my
favorite writers do exactly this.</p>
<p>They write about durability guarantees of disks and filesystems. They
write about common pitfalls in benchmarking. They write about database
consistency anomalies. They write about threading and IO models.</p>
<p>And they write about it by showing concrete examples and concrete logic
so you can learn how to stop handwaving on the topic.</p>
<p>Their writing helps you come out with a useful mental model you can
apply to your own problems.</p>
<p>And you know, sometimes it's not about the topic being
obscure. Good writers have the ability to tackle a boring topic in an
interesting light. Maybe by digging deeper into a root cause. Or
showing you the history behind the scenes.</p>
<p>Moreover, my favorite writers don't know everything. But they also
don't pretend to know everything. They're quick to admit they don't
understand something and ask for help from their readers.</p>
<h3 id="show-working-code">Show working code</h3><p>I love to see complete working code in a post. In contrast there are
many projects that start out simple and people write an article that
covers the project at a high level. But they keep working on the
project and it becomes more complex.</p>
<p>It's not always easy to follow commits over time.</p>
<p>Eli Bendersky and Serge Zaitsev are particularly great at developing
small but meaningful projects in a single post or short series.</p>
<p>On the other hand, if people only did this, we wouldn't hear about the
development of long-running projects like V8 or Postgres. So I guess
this style has limits. And I don't penalize people talking about
long-running projects for not showing working code.</p>
<h3 id="make-things-simpler">Make things simpler</h3><p>One of the marks of a good writer is that you can make complex
topics simple. And not just by being reductive. Though sometimes even
being reductive is useful for education.</p>
<p>In contrast I sometimes see articles by less experienced writers and I
marvel how they make a simple topic so complex. I recognize this
because I was <em>absolutely</em> like that 10 years ago, if not 5 years ago.</p>
<h3 id="write-regularly">Write regularly</h3><p>My favorite blogs typically get a new post at least once a month. Some
people, like Murat, write once a week.</p>
<p>I think the practice probably does improve your writing but mostly
it's that they keep my attention by publishing regularly!</p>
<h3 id="talk-about-tradeoffs-and-downsides">Talk about tradeoffs and downsides</h3><p>Nothing builds trust like talking about the issues with something you
built. No project is perfect. And to ignore the downsides risks
seeming like you don't know or understand them.</p>
<p>So the writers I like the most talk about decisions in context. They
talk about the good and the bad.</p>
<h3 id="avoid-internet-slang,-memes,-swearing,-sarcasm,-and-ranting">Avoid internet slang, memes, swearing, sarcasm, and ranting</h3><p>There's no way I can think of talking about this without sounding
super lame.</p>
<p>One thing I've noticed, particularly among younger colleagues, is the
use of memes or swearing or using 4chan slang or using sarcasm. I
used to write like this 10 years ago too.</p>
<p>There is a chunk of your audience who won't care. The problem is that
there's also a chunk of your (potential) audience who definitely does
care. There's even a chunk of your audience who may not care but just
won't understand (i.e. non-native English speakers).</p>
<p>I have friends and folks I respect who write very well. But that also
are also overly, unnecessarily edgy when they write. I don't like
sharing these posts because I don't want to unnecessarily offend or
turn off people either.</p>
<h3 id="closing-thoughts">Closing thoughts</h3><p>It would be boring if everyone wrote the same way. I'm glad the
internet is fun and weird. But I wanted to share a few things
that go into my favorite technical blogs that I'm always happy to
refer people to.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short post on what I think makes a great technical blog.<a href="https://t.co/QRFtQyQyU5">https://t.co/QRFtQyQyU5</a> <a href="https://t.co/QpsQC90EX5">pic.twitter.com/QpsQC90EX5</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1778184061447774328?ref_src=twsrc%5Etfw">April 10, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-04-10-what-makes-a-great-tech-blog.htmlWed, 10 Apr 2024 00:00:00 +0000
- A paper reading club at work; databases and distributed systems researchhttp://notes.eatonphil.com/2024-04-05-company-paper-club.html<p>I started a paper reading club this week at work, focused on databases
and distributed systems research. I posted in a general channel about
the premise and asked if anyone was interested. I clarified the
intended topic and that discussions would be asynchronous over email,
run fortnightly.</p>
<p>Weekly seemed way too often. A month seemed way too
infrequent. Fortnightly seemed decent.</p>
<p>I was nervous to do this because I've been here about 2 months. In the
past I would have waited 6 months or a year to do this. But I don't
know. If you see something you think should exist, why wait?</p>
<p>The only other consideration was <a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">past experiences I've written
about</a>
having difficulty getting engagement with clubs at work. But EDB has
near 1,000 employees. I figured there might at least be a couple
interested.</p>
<p>Furthermore I figured if I only got a few people this entire idea
would at least benefit myself, since I have been wanting to force
myself to build a paper reading habit. And if no one responded, it
would be only mildly embarassing and I'd not pursue it further.</p>
<p>But after a day, about 6 people showed interest. Which was better than
I hoped! Folks from product management, support, development, and
beyond.</p>
<p>So I opened a dedicated channel and asked people to start submitting
papers and voting on them. One of my teammates started submitting some
great papers on caches and reference counting.</p>
<p>I picked a first one, the Redshift paper, to get us started.
Demonstrating the process to avoid confusion. And I made a calendar
invite for everyone in the channel, the paper linked in the invite. I
clarified in the invite that it was just a reminder and that the real
discussion would still be async over email. (I've found it's best to
repeatedly clarify process stuff.)</p>
<p>Once I had these first few folks interested I was able to post again
in a broader company channel that a couple of us were starting this
paper club. By the end of the day the dedicated channel was 29
folks. All in about 2 days.</p>
<p>Mailing lists are nicer than Slack or Discord in my opinion because
they sort of force you to slow down, they are harder to miss (if
someone starts a thread after you've seen a message in Slack or
Discord, you tend to miss it), and easier to manage
(read/unread).</p>
<p>Engineers often seem to get overwhelmed by a mass of Slack
messages. Whereas they seem to be a bit more comfortable with email
threads.</p>
<p>All of this is all the more important when you're running a global
group. EDB has people everywhere.</p>
<p>Why do this?</p>
<p>Before I dropped out of college I did a research internship with a
VLSI group at Harvard SEAS. And my favorite part was that they had a
weekly (or biweekly?) Wednesday paper reading session where 15 people
from the lab and adjacent labs would eat pizza after hours and discuss
a paper.</p>
<p>I've been dying to recreate this at a company ever since. Since EDB is
so distributed, we won't be discussing over pizza. But I'm still
excited.</p>
<p>And I hope my experience serves as a blueprint for others.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I started a paper reading club at work, wrote about it as a possible blueprint for others.<br><br>I'm excited! I've wanted to have a gang at work with whom to read papers for a long time.<a href="https://t.co/vpwERj8pHe">https://t.co/vpwERj8pHe</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1776415593173782938?ref_src=twsrc%5Etfw">April 6, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-04-05-company-paper-club.htmlFri, 05 Apr 2024 00:00:00 +0000
- Finding memory leaks in Postgres C codehttp://notes.eatonphil.com/2024-03-27-finding-memory-leaks-in-postgres-c-code.html<head>
<meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/finding-memory-leaks-postgres-c-code'" />
</head><p>This is an external post of mine. Click
<a href="https://www.enterprisedb.com/blog/finding-memory-leaks-postgres-c-code">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2024-03-27-finding-memory-leaks-in-postgres-c-code.htmlWed, 27 Mar 2024 00:00:00 +0000
- Zig, Rust, and other languageshttp://notes.eatonphil.com/2024-03-15-zig-rust-and-other-languages.html<p>Having worked a bit in Zig, Rust, Go and now C, I think there are a
few common topics worth having a fresh conversation on: automatic
memory management, the standard library, and explicit allocation.</p>
<p>Zig is not a mature language. But it has made enough useful choices
for a number of companies to invest in it and run it in
production. The useful choices make Zig worth talking about.</p>
<p>Go and Rust are mature languages. But they have both made questionable
choices that seem worth talking about.</p>
<p>All of these languages are developed by highly intelligent folks I
personally look up to. And your choice to use any one of these is
certainly fine, whichever it is.</p>
<p>The positive and negative choices particular languages made, though, are
worth talking about as we consider what a systems programming language
10 years from now would look like. Or how these languages themselves
might evolve in the next 10 years.</p>
<p>My perspective is mostly building distributed databases. So the points
that I bring up may have no relevance to the kind of work you do, and
that's alright. Moreover, I'm already aware most of these opinions are
not shared by the language maintainers, and that's ok too. I am not
writing to convince anyone.</p>
<h3 id="automatic-memory-management">Automatic memory management</h3><p>One of my bigger issues with Zig is that it doesn't support RAII. You
can defer cleanup to the end of a block; and this is half of the
problem. But only RAII will allow for smart pointers and automatic
(not manual) reference counting. RAII is an excellent option to
default to, but in Zig you aren't allowed to. In contrast, even C
"supports" automatic cleanup (via compiler extensions).</p>
<p>But most of the time, arenas are fine. Postgres is written in C and
memory is almost entirely managed through nested arenas (called
"memory contexts") that get cleaned up when some subset of a task
finishes, recursively. Zig has builtin support for arenas, which is
great.</p>
<h3 id="standard-library">Standard library</h3><p>It seems regrettable that some languages have been shipping smaller
standard libraries. Smaller standard libraries seem to encourage users
of the language to install more transitively-unvetted third-party
libraries, which increases build time and build flakiness, and which
increases bitrot over time as unnecessary breaking changes occur.</p>
<p>People have been making jokes about <code>node_modules</code> for a decade now, but
this problem is just as bad in Rust codebases I've seen. And to a
degree it happens in Java and Go as well, though their larger standard
libraries allow you to get further without dependencies.</p>
<p>Zig has a good standard library, which may be Go and Java tier in a
few years. But one goal of their package manager seemed to be
to allow the standard library to be broken up; made smaller. For
example, JSON support moving out of the standard library into a
package. I don't know if that is actually the planned direction. I
hope not.</p>
<p>Having a large standard library doesn't mean that the programmer
shouldn't be able to swap out implementations easily as needed. But
all that is required is for the standard library to define an
<strong>interface</strong> along with the standard library implementation.</p>
<p>The small size of the standard library doesn't just affect developers
using the language, it even encourages developers of the language
itself to depend on libraries owned by individuals.</p>
<p>Take a look at the transitive dependencies of an official Node.js
package like
<a href="https://github.com/nodejs/node-gyp/blob/main/package.json#L25">node-gyp</a>. Is
it really the ideal outcome of a small standard library to encourage
dependence in official libraries on libraries owned by individuals,
like <a href="https://github.com/sindresorhus/env-paths">env-paths</a>, that
haven't been modified in 3 years? 68 lines of code. Is it not safer at
this point to vendor that code? i.e. copy the <code>env-paths</code> code into
<code>node-gyp</code>.</p>
<p>Similarly, if you go looking for compression support in Rust, there's
none in the standard library. But you may notice the
<a href="https://github.com/rust-lang/flate2-rs">flate2-rs</a> repo under the
official <a href="https://github.com/rust-lang">rust-lang</a> GitHub
namespace. If you look at its transitive dependencies:
<a href="https://github.com/rust-lang/flate2-rs/blob/main/Cargo.toml#L23">flate2-rs</a>
depends on (an individual's)
<a href="https://github.com/Frommi/miniz_oxide/blob/master/miniz_oxide/Cargo.toml#L20">miniz_oxide</a>
which depends on (an individual's)
<a href="https://github.com/jonas-schievink/adler">adler</a> that hasn't been
updated in 4 years. 300 lines of code including tests. Why not vendor
this code? It's the habits a small standard library builds that seem
to encourage everyone not to.</p>
<p>I don't mean these necessarily constitute a supply-chain risk. I'm not
talking about
<a href="https://www.theregister.com/2016/03/23/npm_left_pad_chaos/">left-pad</a>. But
the pattern is sort of clear. Even official packages may end up
depending on external party packages, because the commitment to a
small standard library meant omitting stuff like compression,
checksums, and common OS paths.</p>
<p>It's a tradeoff and maybe makes the job of the standard library
maintainer easier. But I don't think this is the ideal
situation. Dependencies are useful but should be kept to a reasonable
minimum.</p>
<p>Hopefully languages end up more like Go than like Rust in
this regard.</p>
<h3 id="explicit-allocation">Explicit allocation</h3><p>When folk discuss the Zig standard library's pattern of requiring an
allocator argument for every method that allocates, they often talk
about the benefit of swapping out allocators or the benefit of being
able to handle OOM failures.</p>
<p>Both of these seem pretty niche to me. For example, in Zig tests you
are encouraged to pass around a debug allocator that tells you about
memory leaks. But this doesn't seem too different from compiling a C
project with a debug allocator or compiling with different sanitizers
on and running tests against the binary produced. In both cases you
mostly deal with allocators at a global level depending on the
environment you're running the code in (production or tests).</p>
<p>The real benefit of explicit allocations to me is much more
trivial. You basically can't code a method in Zig without
acknowledging allocations.</p>
<p>This is particularly useful for hotpath code. Take an iterator for
example. It has a <code>new()</code> method, a <code>next()</code> method, and a <code>done()</code>
method. In most languages, it's basically impossible at the syntax or
compiler-level to know if you are allocating in the <code>next()</code> method. You
may know because you know the behavior of all the code in <code>next()</code> by
heart. But that won't happen all the time.</p>
<p>Zig is practically alone in that if you write the <code>next()</code> method and
and don't pass an allocator to any method in the <code>next()</code> body,
nothing in that <code>next()</code> method will allocate.</p>
<p>In any other language it might not be until you run a profiler that
you notice an allocation that should have been done once in <code>new()</code>
accidentally ended up in <code>next()</code> instead.</p>
<p>On the other hand, for all the same reasons, writing Zig is kind of a
pain because everything takes an allocator!</p>
<p>Explicit allocation is not intrinsic to Zig, the language. It is a
convention that is prevalent in the standard library. There is still a
global allocator and any user of Zig could decide to use the global
allocator. At which point you've got implicit allocation. So explicit
allocation as a convention isn't a perfect solution.</p>
<p>But it, by default, gives you a level of awareness of allocations you
just can't get from typical Go or Rust or C code, depending on the
project's practices. Perhaps it's possible to switch off the Go, Rust
and C standard library and use one where all functions that allocate
do require an allocator.</p>
<p>But explicitly passing allocators is still sort of a visual hack.</p>
<p>I think the ideal situation in the future will be that every language
supports annotating blocks of code as <code>must-not-allocate</code> or something
along those lines. Either the compiler will enforce this and fail if
you seem to allocate in a block marked <code>must-not-allocate</code>, or it will
panic during runtime so you can catch this in tests.</p>
<p>This would be useful beyond static programming languages. It would be
as interesting to annotate blocks in JavaScript or Python as
<code>must-not-allocate</code> too.</p>
<p>Otherwise the current state of things is that you'd normally configure
this sort of thing at the global level. Saying "there must not be
any allocations in this entire program" just doesn't seem as useful in
general as being able to say "there must not be any allocations in
this one block".</p>
<h4 id="optional,-not-required,-allocator-arguments">Optional, not required, allocator arguments</h4><p>Rust has nascent support for passing an allocator to methods that
allocate. But it's optional. From what I understand, C++ STL is like
this too.</p>
<p>These are both super useful for programming extensions. And it's one
of the reasons I think Zig makes a ton of sense for Postgres
extensions specifically. Because it was only and always ever built for
running in an environment with someone else's allocator.</p>
<h3 id="praise-for-zig,-rust,-and-go-tooling">Praise for Zig, Rust, and Go tooling</h3><p>All three of these have really great first-party tooling including
build system, package management, test runners and formatters. The
idea that the language should provide a great environment to code in
(end-to-end) makes things simpler and nicer for programmers.</p>
<h3 id="meandering-non-conclusion">Meandering non-conclusion</h3><p>Use the language you want to use. Zig and Rust are both nice
alternatives to writing vanilla C.</p>
<p>On the other hand, I've been pleasantly surprised writing Postgres C.
How high level it is. It's almost a separate language since you're
often dealing with user-facing constructs, like Postgres's Datum
objects which represent what you might think of as a cell in a
Postgres database. And you can use all the same functions provided for
Postgres SQL for working with Datums, but from C.</p>
<p>I've also been able work a bit on Postgres extensions in Rust with
<a href="https://github.com/pgcentralfoundation/pgrx">pgrx</a> lately, which I
hope to write about soon. And when I saw
<a href="https://github.com/xataio/pgzx">pgzx</a> for writing Postgres extensions in Zig
I was excited to spend some time with that too.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post on my wishlist for Zig and Rust. Focused on automatic memory management, the standard library, and explicit allocation.<a href="https://t.co/dvynizU9V2">https://t.co/dvynizU9V2</a> <a href="https://t.co/iTXp5QVxj0">pic.twitter.com/iTXp5QVxj0</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1768725864923931033?ref_src=twsrc%5Etfw">March 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-03-15-zig-rust-and-other-languages.htmlFri, 15 Mar 2024 00:00:00 +0000
- First month on a database teamhttp://notes.eatonphil.com/2024-03-11-first-month-on-a-database-team.html<p><!-- -*- mode: markdown -*- --></p>
<p>A little over a month ago, I joined EnterpriseDB on a
distributed Postgres product
(<a href="https://enterprisedb.com/docs/pgd">PGD</a>). The process of onboarding
myself has been pretty similar at each company in the last decade,
though I think I've gotten better at it. The process is of course
influenced by the team, and my coworkers have been excellent. Still, I
wanted to share my thought process and personal strategies.</p>
<h3 id="avoid,-at-first,-what-is-always-challenging">Avoid, at first, what is always challenging</h3><p>Trickier things at companies are the people, organization, and
processes. What code exists? How does it work together? Who owns what?
How can I find easy code issues to tackle? How do I know what's
important (so I can avoid picking it up and becoming a bottleneck).</p>
<p>But also, in the first few days or weeks you aren't exactly expected
to contribute meaningfully to features or bugs. Your sprint
contributions are not tracked too closely.</p>
<p>The combination of 1) what to avoid and 2) the sprint-freedom-you-have
leads to a few interesting and valuable areas to work on on your own:
the build process, tests, running the software, and docs.</p>
<p>But code need not be ignored either. Some frequent areas to get your
first code contributions in include user configuration code, error
messages, and stale code comments.</p>
<p>What follows are some little 1st day, 1st week, 1st month projects I
went through to bootstrap my understanding of the system.</p>
<h3 id="build-process">Build process</h3><p>First off, where is the code and how do you build it? This requires
you to have all the relevant dependencies. Much of my work is on a
Postgres extension. This meant having a local Postgres development
environment, having gcc, gmake (on mac), Perl, and so on. And
furthermore, PGD is a pretty mature product so it supports building
against multiple Postgres distributions. Can I build against all of
them?</p>
<p>The easiest situation is when there are instructions for all of this,
linked directly from your main repo. When I started, the instructions
did exist but in a variety of places. So over the first week I started
collecting all of what I had learned about building the system, with
dependencies, across distributions, and with various important flags
(debug mode on, asserts enabled, etc.). I finished the first week by
writing a little internal blog post called "Hacking on PGD".</p>
<p>I hadn't yet figured out the team processes so I didn't want to bother
anyone by trying to get this "blog post" committed anywhere yet as
official internal documentation. Maybe there already was a good doc, I
just hadn't noticed it yet. So I just published it in a private
Confluence page and shared it in the private team slack. If anyone
else benefited from it, great! Otherwise, I knew I'd want to refer
back to it.</p>
<p>This is an important attitude I think. It can be hard to tell what
others will benefit from. If you get into the habit of writing things
down internally for your own sake, but making it available internally,
it is likely others will benefit from it too. This is something I've
learned from years of blogging publicly outside of work.</p>
<p>Moreover, the simple act of writing a good post creates yourself as
something of an authority. This is useful for yourself if no one else.</p>
<h4 id="writing-a-good-post">Writing a good post</h4><p>Let's get distracted here for a second. One of the most important
things I think in documentation is documenting not just what does
exist but what doesn't. If you had to take a path to get something to
work, did you try other paths that didn't work? It can be extremely
useful to figure out what <em>exactly</em> is required for something.</p>
<p>Was there a flag that you tried to build with but you didn't try
building without it? Well try again without it and make sure it was
necessary. Was there some process you executed where the build
succeeded but you can't remember if it was actually necessary for the
build to succeed?</p>
<p>It's difficult to explain why I think this sort of precision is
useful but I'm pretty sure it is. Maybe because it builds the habit of
not treating things as magic when you can avoid it. It builds the
habit of asking questions (if only to yourself) to understand and not
just to get by.</p>
<h4 id="static-analysis?-dynamic-analysis?">Static analysis? Dynamic analysis?</h4><p>Going back to builds, another aspect to consider is static and dynamic
analysis. Are there special steps to using gdb or valgrind or other
analyzers? Are you using them already? Can you get them running
locally? Has any of this been documented?</p>
<p>Maybe the answer to all of those is yes, or maybe none of those are
relevant but there are likely similar tools for your ecosystem. If
analysis tools are relevant and no one has yet explored them, that's
another very useful area to explore as a newcomer.</p>
<h3 id="testing">Testing</h3><p>After I got the builds working, I felt the obvious next step was to
run tests. But what tests exist? Are there unit tests? Integration
tests? Anything else? Moreover, is there test coverage? I was certain
I'd be able to find some low-hanging contributions to make if I could
find some files with low test coverage.</p>
<p>Alas, my certainty hit the wall in that there were in fact too many
types of integration tests that all do provide coverage already. They
just don't all <em>report</em> coverage.</p>
<p>The easiest ways to report coverage (with gcov) were only reporting
coverage for certain integration tests that we run locally. There are
more integration tests run in cloud environments and getting coverage
reports there to merge with my local coverage files would have
required more knowledge of people and processes, areas that I wanted
not to be forced to think about too quickly.</p>
<p>So coverage wasn't a good route to go. But around this time, I noticed
a ticket that asked for a simple change to user configuration code. I
was able to make the change pretty quickly and wanted to add tests. We
have our own test framework built on top of Postgres's powerful Perl
test framework. But it was a little difficult to figure out how to use
either of them.</p>
<p>So I copied code from other tests and pared it down until I got the
smallest version of test code I could get. This took maybe a day or
two of tweaking lines and rerunning tests since I didn't understand
everything that was/wasn't required. Also it's Perl and I've never
written Perl before so that took a bit of time and ChatGPT. (Arrays,
man.)</p>
<p>In the end though I was able to collect my learnings into another
internal confluence post just about how to write tests, how to debug
tests, how to do common things within tests (for example, ensuring a
Postgres log line was outputted), etc. I published this post as well
and shared it in the team Slack.</p>
<h3 id="running">Running</h3><p>I had PGD built locally and was able to run integration tests locally,
but I still hadn't gotten a cluster running. Nor played with the
eventual consistency demos I knew we supported. We had a great
quickstart that ran through all the manual steps of getting a two-node
cluster up. This was a distillation, for devs, of a more elaborate
process we give to customers in a production-quality script.</p>
<p>But I was looking for something in between a production-quality
script and manually initializing a local cluster. And I also wanted to
practice my understanding of our test process. So I ported our
quickstart to our integration test framework and made a PR with this
new test, eventually merging this into the repo. And I wrote a minimal
Python script for bringing up a local cluster. I've got an open PR to
add this script to the repo. Maybe I'll learn though that a simple
script such as this does already exist, and that's fine!</p>
<h3 id="docs">Docs</h3><p>The entire time, as I'd been trying to build and test and run PGD, I
was trying to understand our terminology and architecture by going
through our public docs. I had a lot of questions coming out of this
I'd ask in the team channel.</p>
<p>Not to toot my horn but I think it's somewhat of a superpower to be
able/willing to ask "dumb questions" in a group setting. That's how I
frame it anyway. "Dumb question: what does X mean in this paragraph?"
Or, "dumb question: when we say performance improvement because of Y,
what is the intuition here?" Because of the time spent here, I was
able to make a few more docs contributions as I read through the docs
as well.</p>
<p>You have to balance where you ask your dumb questions though. Asking
dumb questions to one person doesn't benefit the team. But asking dumb
questions in too wide a group is sometimes bad politics. Asking "dumb
questions" in front of your team seems to have the best bang for buck.</p>
<p>But maybe the more important contributions were, when I got more
comfortable with the team, proposing to merge my personal, internal
Confluence blog posts into the repo as docs. I think in a number of
cases, what I wrote about indeed hadn't been concisely collected
before and thus was useful to have as team documentation.</p>
<p>Even more challenging was trying to distill (a chunk of) the internal
architecture. Only after following many varied internal docs and
videos, and following through numerous code paths, was I able to
propose an architecture diagram outlining major components and
communication between them, with their differing formats (WAL records,
internal enums, etc.) and means of communication (RPC, shared memory,
etc.). This architecture diagram is still in review and may be totally
off. But it's already helped at least me think about the system.</p>
<p>In most cases this was all information that the team had already
written or explained but just bringing it together and summarizing
provided a different useful perspective I think. Even if none of the
docs got merged it still helped to build my own understanding.</p>
<h3 id="beyond-the-repo">Beyond the repo</h3><p>Learning the project is just one aspect of onboarding. Beyond that I
join the #cats channel, the #dogs channel, found some fellow New
Yorkers and opened a NYC channel, and tried to find Zoom-time with the
various people I'd see hanging around common team Slack
channels. Trying to meet not just devs but support folk, product
managers, marketing folk, sales folk, and anyone else!</p>
<p>Walking the line between scouring our docs and GitHub and Confluence
and Jira on my own, and bugging people with my incessant questions.</p>
<p>I've enjoyed my time at startups. I've been a dev, a manager, a
founder, a cofounder. But I'm incredibly excited to be back, at a
bigger company, full-time as a developer hacking on a database!</p>
<p>And what about you? What do you do to onboard yourself at a new
company or new project?</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I've been having an absolute blast in my first month at EDB and I wanted to share a few of my strategies for onboarding myself on a database team. Strategies broadly applicable for devs on a new team/project.<a href="https://t.co/TS5qRLysuA">https://t.co/TS5qRLysuA</a> <a href="https://t.co/lvuxDBQJwx">pic.twitter.com/lvuxDBQJwx</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1767371003527672237?ref_src=twsrc%5Etfw">March 12, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-03-11-first-month-on-a-database-team.htmlMon, 11 Mar 2024 00:00:00 +0000
- An intuition for distributed consensus in OLTP systemshttp://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html<p><!-- -*- mode: markdown -*- --></p>
<p>Distributed consensus in transactional databases (e.g. etcd or
Cockroach) is a big deal these days. Most often under the hood are
variations of log-based Paxos-like algorithms such as MultiPaxos,
Viewstamped Replication, or Raft. While there are new variations that
come out each year, optimizing for various workloads, these algorithms
are fairly standard and well-understood.</p>
<p>In fact they are used in so many places, Kubernetes for example, that
even if you don't decide to implement Raft (which is fun and I
encourage it), it seems worth building an intuition for distributed
consensus.</p>
<p>What happens as you tweak a configuration. What happens as the
production environment changes. Or what to reach for as product
requirements change.</p>
<p>I've been <a href="https://notes.eatonphil.com/2023-05-25-raft.html">thinking</a>
<a href="https://eatonphil.com/2023-ddia.html">about</a> the
<a href="https://eatonphil.com/2023-database-internals.html">basics</a> of
<a href="https://github.com/eatonphil/raft-rs">distributed consensus</a>
recently. There has been a lot to digest and characterize. And I'm
only beginning to get an understanding.</p>
<p>This post is an attempt to share some of the intuition built up
reading about and working in this space. Originally this post was also
going to end with a walkthrough of my <a href="https://github.com/eatonphil/raft-rs">most
recent</a> Raft implementation in
Rust. But I'm going to hold off on that for another time.</p>
<p>I was fortunate to have a few excellent reviewers looking at versions
of this post: Paul Nowoczynski, Alex Miller, Jack Vanlightly, Daniel
Chia, and Alex Petrov. Thank you!</p>
<p>Let's start with Raft.</p>
<h3 id="raft">Raft</h3><p>Raft is a distributed consensus algorithm that allows you to build a
replicated state machine on top of a replicated log.</p>
<p>A Raft library handles replicating and durably persisting a sequence
(or <i>log</i>) of commands to at least a majority of nodes in a
cluster. You provide the library a state machine that interprets the
replicated commands. From the perspective of the Raft library,
commands are just opaque byte strings.</p>
<p>For example, you could build a replicated key-value store out of <code>SET</code>
and <code>GET</code> commands that are passed in by a client. You provide a Raft
library state machine code that interprets the Raft log of <code>SET</code> and
<code>GET</code> commands to modify or read from an in-memory hashtable. You can
find concrete examples of exactly this replicated key-value store
modeling in <a href="https://notes.eatonphil.com/tags/raft.html">previous Raft
posts</a> I've written.</p>
<p>All nodes in the cluster run the same Raft code (including the state
machine code you provide); communicating among themselves. Nodes elect
a semi-permanent leader that accepts all reads and writes from
clients. (Again, reads and writes are modeled as commands).</p>
<p>To commit a new command to the cluster, clients send the command to
all nodes in the cluster. Only the leader accepts this command, if
there is currently a leader. Clients retry until there is a leader
that accepts the command.</p>
<p>The leader appends the command to its log and makes sure to replicate
all commands in its log to followers in the same order. The leader
sends periodic heartbeat messages to all followers to prolong its term
as leader. If a follower hasn't heard from the leader within a period
of time, it becomes a candidate and requests votes from the cluster.</p>
<p>When a follower is asked to accept a new command from a leader, it
checks if its history is up-to-date with the leader. If it is not, the
follower rejects the request and asks the leader to send previous
commands to bring it up-to-date. It does this ultimately, in the worst
case of a follower that has lost all history, by going all the way
back to the very first command ever sent.</p>
<p>When a quorum (typically a majority) of nodes has accepted a command,
the leader marks the command as committed and applies the command to
its own state machine. When followers learn about newly committed
commands, they also apply committed commands to their own state machine.</p>
<p>For the most part, these details are graphically summarized in Figure
2 of the <a href="https://raft.github.io/raft.pdf">Raft paper</a>.</p>
<h3 id="availability-and-linearizability">Availability and linearizability</h3><p>Taking a step back, distributed consensus helps a group of nodes, a
cluster, agree on a value. A client of the cluster can treat a value
from the cluster as if the value was atomically written to and read
from a single thread. This property is called
<a href="https://jepsen.io/consistency/models/linearizable">linearizability</a>.</p>
<p>However, with distributed consensus, the client of the cluster has
better availability guarantees from the cluster than if the client
atomically wrote to or read from a single thread. A single thread that
crashes becomes unavailable. But some number <code>f</code> nodes can crash in a
cluster implementing distributed consensus and still 1) be available
and 2) provide linearizable reads and writes.</p>
<p>That is: <b>distributed consensus solves the problem of high
availability for a system while remaining linearizable</b>.</p>
<p>Without distributed consensus you can still achieve high
availability. For example, a database might have two read
replicas. But a client reading from a read replica might get stale
data. Thus, this system (a database with two read replicas) is not
linearizable.</p>
<p>Without distributed consensus you can also try synchronous
replication. It would be very simple to do. Have a fixed leader and
require all nodes to acknowledge before committing, But the value here
is extremely limited. If a single node in the cluster goes down the
entire cluster is down.</p>
<p>You might think I'm proposing a strawman. We could simply designate a
permanent leader that handles all reads and writes; and require a
majority of nodes to commit a command before the leader responds to a
client. But in that case, what's the process for getting a lagging
follower up-to-date? And what happens if it is the leader who goes
down?</p>
<p>Well, these are not trivial problems! And, beyond linearizability that
we already mentioned, these problems are exactly what distributed
consensus solves.</p>
<h3 id="why-does-linearizability-matter?">Why does linearizability matter?</h3><p>It's very nice, and often even critical, to have a highly available
system that will never give you stale data. And regardless, it's
convenient to have a term for what we might naively think of as the
"correct" way you'd always want to set and get a value.</p>
<p>So linearizability is a convenient way of thinking about complex
systems, if you can use or build a system that supports it. But it's
not the only consistency approach you'll see in the wild.</p>
<p>As you increase the guarantees of your consistency model, you tend to
sacrifice performance. Going the opposite direction, some production
systems sacrifice consistency to improve performance. For example, you
might allow stale reads from any node, reading only from local state
and avoiding consensus, so that you can reduce load on a leader and
avoid the overhead of consensus.</p>
<p>There are formal definitions for lower consistency models, including
sequential and read-your-writes. You can read the <a href="https://jepsen.io/consistency">Jepsen
page</a> for more detail.</p>
<h3 id="best-and-worst-case-scenarios">Best and worst case scenarios</h3><p>A distributed system relies on communicating over the network. The
worse the network, whether in terms of latency or reliability, the
longer it will take for communication to happen.</p>
<p>Aside from the network, disks can misdirect writes or corrupt data. Or
you could be mounted on a network filesystem such as EBS.</p>
<p>And processes themselves can crash due to low disk space or the OOM
killer.</p>
<p>It will take longer to achieve consensus to commit messages these
scenarios. If messages take longer to reach nodes, or if nodes are
constantly crashing, followers will timeout more often, triggering
leader election. And the leader election itself (which also requires
consensus) will also take longer.</p>
<p>The best case scenario for distributed consensus is where the network
is reliable and low-latency. Where disks are reliable and fast. And
where processes don't often crash.</p>
<p>TigerBeetle has an incredible <a href="https://sim.tigerbeetle.com/">visual
simulator</a> that demonstrates what
happens across ever-worsening environments. While TigerBeetle and this
simulator use Viewstamped Replication, the demonstrated principles
apply to Raft as well.</p>
<h3 id="what-happens-when-you-add-nodes?">What happens when you add nodes?</h3><p>Distributed consensus algorithms make sure that some minimum number of
nodes in a cluster agree before continuing. The minimum number is
proportional to the total number of nodes in the cluster.</p>
<p>A typical implementation of Raft for example will require 3 nodes in a
5-node cluster to agree before continuing. 4 nodes in a 7-node
cluster. And so on.</p>
<p>Recall that the p99 latency for a service is at least as bad as the
slowest external request the service must make. As you increase the
number of nodes you must talk to in a consensus cluster, you increase
the chance of a slow request.</p>
<p>Consider the extreme case of a 101-node cluster requiring 51 nodes to
respond before returning to the client. That's 51 chances for a slower
request. Compared to 4 chances in a 7-node cluster. The 101-node
cluster is certainly more highly available though! It can tolerate 49
nodes going down. The 7-node cluster can only tolerate 3 nodes going
down. The scenario where 49 nodes go down (assuming they're in
different availability zones) seems pretty unlikely!</p>
<h3 id="horizontal-scaling-with-distributed-consensus?-not-exactly">Horizontal scaling with distributed consensus? Not exactly</h3><p>All of this is to say that the most popular algorithms for distributed
consensus, on their own, have nothing to do with horizontal scaling.</p>
<p>The way that horizontally scaling databases like Cockroach or Yugabyte
or Spanner work is by sharding the data, transparent to the
client. Within each shard data is replicated with a dedicated
distributed consensus cluster.</p>
<p>So, yes, distributed consensus can be a <em>part</em> of horizontal
scaling. But again what distributed consensus primarily solves is high
availability via replication while remaining linearizable.</p>
<p>This is not a trivial point to
make. <a href="https://web.archive.org/web/20230327030543/https://etcd.io/docs/v3.2/learning/why/#using-etcd-for-metadata">etcd</a>,
<a href="https://web.archive.org/web/20231212132325/https://www.hashicorp.com/resources/operating-and-running-consul-at-scale">consul</a>,
and <a href="https://github.com/rqlite/rqlite">rqlite</a> are examples of
databases that do not do sharding, only replication, via a single
Raft cluster that replicates all data for the entire system.</p>
<p>For these databases there is no horizontal scaling. If they support
"horizontal scaling", they support this by doing non-linearizable
(stale) reads. Writes remain a challenge.</p>
<p>This doesn't mean these databases are bad. They are not. One obvious
advantage they have over Cockroach or Spanner is that they are
conceptually simpler. Conceptually simpler often equates to easier to
operate. That's a big deal.</p>
<h3 id="optimizations">Optimizations</h3><p>We've covered the basics of operation, but real-world implementations
get more complex.</p>
<h4 id="snapshots">Snapshots</h4><p>Rather than letting the log grow indefinitely, most libraries
implement snapshotting. The user of the library provides a state
machine and also provides a method for serializing the state machine
to disk. The Raft library periodically serializes the state machine to
disk and truncates the log.</p>
<p>When a follower is so far behind that the leader no longer has a log
entry (because it has been truncated), the leader transfers an entire
snapshot to the follower. Then once the follower is caught up on
snapshots, the leader can transfer normal log entries again.</p>
<p>This technique is described in the Raft paper. While it isn't
necessary for Raft to work, it's so important that it is hardly an
optimization and more a required part of a production Raft system.</p>
<h4 id="batching">Batching</h4><p>Rather than limiting clients of the cluster to submitting only one
command at a time, it's common for the cluster to accept many commands
at a time. Similarly, many commands at a time are submitted to
followers. When any node needs to write commands to disk, it can batch
commands to disk as well.</p>
<p>But you can go a step beyond this in a way that is completely opaque
to the Raft library. Each opaque command the client submits can <em>also</em>
contain a batch of messages. In this scenario, only the user-provided
state machine needs to be aware that each command it receives is
actually a batch of messages that it should pull apart and interpret
separately.</p>
<p>This latter techinque is a fairly trivial way to increase throughput
by an order of magnitude or two.</p>
<h4 id="disk-and-network">Disk and network</h4><p>In terms of how data is stored on disk and how data is sent over the
network there is obvious room for optimization.</p>
<p>A naive implementation might store JSON on disk and send JSON over the
network. A slightly more optimized implementation might store binary
data on disk and send binary data over the network.</p>
<p>Similarly you can swap out your RPC for gRPC or introduce zlib for
compression to network or disk.</p>
<p>You can swap out synchronous IO for libaio or io_uring or SPDK/DPDK.</p>
<p>A little tweak I made in my latest Raft implementation was to index
log entries so searching the log was not a linear operation. Another
little tweak was to introduce a page cache to eliminate unnecessary
disk reads. This increased throughput for by an order of magnitude.</p>
<h4 id="flexible-quorums">Flexible quorums</h4><p>This brilliant <a href="https://arxiv.org/pdf/1608.06696.pdf">optimization</a> by
Heidi Howard and co. shows you can relax the quorum required for
committing new commands so long as you increase the quorum required
for electing a leader.</p>
<p>In an environment where leader election doesn't happen often, flexible
quorums can increase throughput and decrease latency. And it's a
pretty easy change to make!</p>
<h4 id="more">More</h4><p>These are just a couple common optimizations. You can also read about
<a href="https://www.pingcap.com/blog/optimizing-raft-in-tikv/">parallel state machine
apply</a>,
<a href="https://www.pingcap.com/blog/optimizing-raft-in-tikv/">parallel append to
disk</a>,
witnesses,
<a href="https://vldb.org/pvldb/vol14/p2203-whittaker.pdf">compartmentalization</a>,
and leader leases. TiKV, Scylla, RedPanda, and Cockroach tend to have
public material talking about this stuff.</p>
<p>There are also a few people I follow who are often reviewing relevant
papers, if they are not producing their own. I encourage you to follow
them too if this is interesting to you:</p>
<ul>
<li><a href="https://muratbuffalo.blogspot.com/">https://muratbuffalo.blogspot.com/</a></li>
<li><a href="https://charap.co/">https://charap.co/</a></li>
<li><a href="https://brooker.co.za/blog/">https://brooker.co.za/blog/</a></li>
<li><a href="https://distributed-computing-musings.com/">https://distributed-computing-musings.com/</a></li>
</ul>
<h3 id="safety-and-testing">Safety and testing</h3><p>The other aspect to consider is safety. For example, checksums for
everything written to disk and passed over the network; or <a href="https://www.usenix.org/conference/fast18/presentation/alagappan">being able
to
recover</a>
from corruption in the log.</p>
<p>Testing is also a big deal. There are prominent tools like
<a href="https://jepsen.io/">Jepsen</a> that check for consistency in the face of
fault injection (process failure, network failure, etc.). But even
Jepsen has its limits. For example, it doesn't test disk failure.</p>
<p>FoundationDB <a href="https://www.youtube.com/watch?v=4fFDFbi3toc">made
popular</a> a number of
testing techniques. And the people behind this testing went on to
build a product, <a href="https://antithesis.com/">Antithesis</a>, around deterministic
testing of non-deterministic code while injecting faults.</p>
<p>And on that topic there's Facebook Experimental's
<a href="https://github.com/facebookexperimental/hermit">Hermit</a> deterministic
Linux hypervisor that may help to test complex distributed
systems. However, my experience with it has not been great and the
maintainers do not seem very engaged with other people who have
reported bugs. I'm hopeful for it but we'll see.</p>
<p>Antithesis and Hermit seem like a boon when half the trouble of
working on distributed consensus implementations is avoiding flakey
tests.</p>
<p>Another promising avenue is emitting logs during the Raft lifecycle
and validating the logs against a TLA+ spec. Microsoft has such a
project that has <a href="https://github.com/etcd-io/raft/issues/111">begun to see
adoption</a> among
open-source Raft implementations.</p>
<h3 id="conclusion">Conclusion</h3><p>Everything aside, consensus is expensive. There is overhead to the
entire consensus process. So if you do not need this level of
availability and can settle for some process via backups, it's going
to have lower latency and higher throughput than if it had to go
through distributed consensus.</p>
<p>If you do need high availability, distributed consensus can be a great
choice. But consider the environment and what you want from your
consensus algorithm.</p>
<p>Also, while MultiPaxos, Raft, and Viewstamped Replication are some of
the most popular algorithms for distributed consensus, there is a
world beyond. Two-phase commit, ZooKeeper Atomic Broadcast, PigPaxos,
EPaxos, Accord by Cassandra. The world of distributed consensus also
gets especially weird and interesting outside of OLTP systems.</p>
<p>But that's enough for one post.</p>
<h3 id="further-reading">Further reading</h3><ul>
<li><a href="https://raft.github.io/raft.pdf">The Raft Paper</a></li>
<li><a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla">The Raft TLA+ Spec</a></li>
<li><a href="https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf">The Raft Author's PhD Thesis on Raft</a></li>
<li><a href="https://dataintensive.net/">Designing Data-Intensive Applications</a></li>
<li><a href="https://dabeaz.com/raft.html">David Beazley's Raft Course</a> if you can get your company to pay for it</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a post about building an intuition for distributed consensus in OLTP systems!<br><br>Very grateful to all the folks who reviewed.<a href="https://t.co/wMxUuokKeg">https://t.co/wMxUuokKeg</a> <a href="https://t.co/cfY2kdfqak">pic.twitter.com/cfY2kdfqak</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1755580821476397527?ref_src=twsrc%5Etfw">February 8, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.htmlThu, 08 Feb 2024 00:00:00 +0000
- Writing a minimal in-memory storage engine for MySQL/MariaDBhttp://notes.eatonphil.com/2024-01-09-minimal-in-memory-storage-engine-for-mysql.html<p><!-- -*- mode: markdown -*- --></p>
<p>I <a href="https://eatonphil.com/2024-01-wehack-mysql.html">spent a week</a>
looking at MySQL/MariaDB internals along with ~80 other devs. Although
MySQL and MariaDB are mostly the same (more on that later), I focused
on MariaDB specifically this week.</p>
<p>Before last week I had never built MySQL/MariaDB before. The first day
of this hack week, I got MariaDB building locally and <a href="https://twitter.com/eatonphil/status/1742649922791395501">made a code
tweak</a> so
that <code>SELECT 23</code> returned <code>213</code>, and <a href="https://twitter.com/eatonphil/status/1742654868085526896">another
tweak</a> so
that <code>SELECT 80 + 20</code> returned <code>60</code>. The second day I got a <a href="https://twitter.com/eatonphil/status/1742958892957446490">basic UDF
in C</a>
working so that <code>SELECT mysum(20, 30)</code> returned <code>50</code>.</p>
<p>The rest of the week I spent figuring out how to build a minimal
in-memory storage engine, which I'll walk through in this post. 218 lines
of C++.</p>
<p>It supports <code>CREATE</code>, <code>DROP</code>, <code>INSERT</code>,
and <code>SELECT</code> for tables that only have <code>INTEGER</code> fields. It is
explicitly not thread-safe because I didn't have time to understand
MariaDB's lock primitives.</p>
<p>In this post I'll also talk about how the MariaDB custom storage API
compares to the Postgres one, based on <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">a previous hack week project I
did</a>.</p>
<p>All code for this post can be found in <a href="https://github.com/eatonphil/mariadb/tree/11.4/storage/memem">my fork on
GitHub</a>.</p>
<h3 id="mysql-and-mariadb">MySQL and MariaDB</h3><p>Before we go further though, why do I keep saying MySQL/MariaDB?</p>
<p>MySQL is GPL licensed (let's completely ignore the commercial
variations of MySQL that Oracle offers). The code is
open-source. However, the development is done behind closed
doors. There is a code dump <a href="https://github.com/mysql/mysql-server/commits/trunk/">every
month</a> or so.</p>
<p>MariaDB is a fork of MySQL by the creator of MySQL (who is no longer
involved, as it happens). It is also GPL licensed (let's completely
ignore the commercial variations of MariaDB that MariaDB Corporation
offers). The code is open-source. The development is also open-source.</p>
<p>When you install "MySQL" in your Linux distro you are <a href="https://mariadb.com/newsroom/press-releases/mariadb-replaces-mysql-as-the-default-in-debian-9/">often
actually</a>
installing MariaDB.</p>
<p>The two are mostly compatible. During this week, I <a href="https://twitter.com/eatonphil/status/1742642758408405237">stumbled
onto</a> that
they evolved support for <code>SELECT .. FROM VALUES ..</code> differently. Some
differences are documented on <a href="https://mariadb.com/kb/en/moving-from-mysql/">the MariaDB
KB</a>. But this KB is
painful to browse. Which leads me to my next point.</p>
<p>The <a href="https://dev.mysql.com/doc/">MySQL docs</a> are excellent. Easy to
read, browse; and they are thorough. The <a href="https://mariadb.com/kb">MariaDB
docs</a> are a work in progress. I'm sorry I
can't be stoic: in just a week I've come to really hate using this
KB. Thankfully, in some twisted way, it also doesn't seem to be very
thorough either. It isn't completely avoidable though since there is
no guarantee MySQL and MariaDB do the same thing.</p>
<p>Ultimately, I spent the week using MariaDB because I'm biased toward
fully open projects. But I kept having to look at MySQL docs, hoping
they were relevant.</p>
<p>Now that you understand the state of things, let's move on to fun
stuff!</p>
<h3 id="storage-engines">Storage engines</h3><p>Mature databases often support swapping out the storage layer. Maybe
you want an in-memory storage layer so that you can quickly run
integration tests. Maybe you want to switch between B-Trees
(read-optimized) and LSM Trees (write-optimized) and unordered heaps
(write-optimized) depending on your workload. Or maybe you just want
to try a third-party storage library
(e.g. <a href="https://rocksdb.org/">RocksDB</a> or <a href="https://sled.rs/">Sled</a> or
<a href="https://tikv.org/">TiKV</a>).</p>
<p>The benefit of swapping out only the storage engine is that, from a
user's perspective, the semantics and features of the database stay
mostly the same. But the database is magically faster for a workload.</p>
<p>You keep powerful user management, extension support, SQL support, and
a well-known wire protocol. You modify only the method of storing the
actual data.</p>
<h4 id="existing-storage-engines">Existing storage engines</h4><p>MySQL/MariaDB is particularly well known for its custom storage engine
support. The MySQL docs for <a href="https://dev.mysql.com/doc/refman/8.0/en/storage-engines.html">alternate storage
engines</a>
are great.</p>
<p>While the docs do warn that you should probably stick with the default
storage engine, that warning didn't quite feel strong enough because
nothing else seemed to indicate the state of other engines.</p>
<p>Specifically, in the past I was always interested in the CSV storage
engine. But when you look at the <a href="https://github.com/MariaDB/server/blob/11.4/storage/csv/ha_tina.cc">actual code for the CSV
engine</a>
there is a pretty strong warning:</p>
<div class="highlight"><pre><span></span>First off, this is a play thing for me, there are a number of things
wrong with it:
*) It was designed for csv and therefore its performance is highly
questionable.
*) Indexes have not been implemented. This is because the files can
be traded in and out of the table directory without having to worry
about rebuilding anything.
*) NULLs and "" are treated equally (like a spreadsheet).
*) There was in the beginning no point to anyone seeing this other
then me, so there is a good chance that I haven't quite documented
it well.
*) Less design, more "make it work"
Now there are a few cool things with it:
*) Errors can result in corrupted data files.
*) Data files can be read by spreadsheets directly.
TODO:
*) Move to a block system for larger files
*) Error recovery, its all there, just need to finish it
*) Document how the chains work.
-Brian
</pre></div>
<p>The difference between the seeming confidence of the docs and seeming
confidence of the contributor made me chuckle.</p>
<p>The benefit of these diverse storage engines for me was that they give
examples of how to implement the storage engine API. The
<a href="https://github.com/MariaDB/server/blob/11.4/storage/csv">csv</a>,
<a href="https://github.com/MariaDB/server/tree/11.4/storage/blackhole">blackhole</a>,
<a href="https://github.com/MariaDB/server/tree/11.4/storage/example">example</a>,
and <a href="https://github.com/MariaDB/server/tree/11.4/storage/heap">heap</a>
storage engines were particularly helpful to read.</p>
<p>The heap engine is a complete in-memory storage engine. Complete means
complex though. So there seemed to be room for a stripped down version
of an in-memory engine.</p>
<p>And that's we'll cover in this post! First though I want to talk a
little bit about the limitations of custom storage engines.</p>
<h3 id="limitations">Limitations</h3><p>While being able to tailor a storage engine to a workload is powerful,
there are limits to the benefits based on the design of the storage
API.</p>
<p>Both Postgres and MySQL/MariaDB currently have a custom storage API
built around <em>individual rows</em>.</p>
<h4 id="column-wise-execution">Column-wise execution</h4><p>I have <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">previously
written</a>
that custom storage engines allows you to switch between column- and
row-oriented data storage. Two big reasons to do column-wise storage
are 1) opportunity for compression, and 2) fast operations on a single
column.</p>
<p>The opportunity for 1) compression <em>on disk</em> would still exist even if
you needed to deal with individual rows at the storage API layer since
the compression could happen on disk. However any benefits of passing
around compressed columns <em>in memory</em> disappear if you must convert to
rows for the storage API.</p>
<p>You'd also lose the advantage for 2) fast operations on a single
column if the column must be converted into a row at the storage API
whereupon it's passed to higher levels that perform execution. The
execution would happen row-wise, not column-wise.</p>
<p>All of this is to say that while column-wise storage is possible, the
<em>benefit of doing so</em> is not obvious with the current API design for
both MySQL/MariaDB and Postgres.</p>
<h4 id="vectorization">Vectorization</h4><p>An API built around individual rows also sets limits on the amount of
vectorization you can do. A custom storage engine could still do some
vectorization under the hood: always filling a buffer with N rows and
returning a row from the buffer when the storage API requests a single
row. But there is likely some degree of performance left on the table
with an API that deals with individual rows.</p>
<p>Remember though: if you did batched reads and writes of rows in the
custom storage layer, there isn't necessarily any vectorization
happening at the execution layer. From a <a href="https://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.html">previous
study</a>
I did, neither MySQL/MariaDB nor Postgres do vectorized query
execution. This paragraph isn't a critique of the storage API, it's
just something to keep in mind.</p>
<h4 id="storage-versus-execution">Storage versus execution</h4><p>The general point I'm making here is that unless both the execution
and storage APIs are designed in a certain way, you may attempt
optimizations in the storage layer that are ineffective or even
harmfull because the execution layer doesn't or can't take advantage
of them.</p>
<h4 id="nothing-permanent">Nothing permanent</h4><p>The current limitations of the storage API are not intrinsic aspects
of MySQL/MariaDB or Postgres's design. For both project there used to
be no pluggable storage at all. We can imagine a future patch to
either project that allows support for batched row reads and writes
that together could make column-wise storage and vectorized execution
more feasible.</p>
<p>Even today there have been invasive attempts to fully support
<a href="https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-compression-for-postgres/">column-wise storage and
execution</a>
in Postgres. And there have also been projects to bring <a href="https://github.com/citusdata/postgres_vectorization_test">vectorized
execution to
Postgres</a>.</p>
<p>I'm not as familiar with the MySQL landscape to comment about efforts
at the moment their.</p>
<h3 id="debug-build-of-mariadb-running-locally">Debug build of MariaDB running locally</h3><p>Now that you've got some background, let's get a debug build of
MariaDB!</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/MariaDB/server<span class="w"> </span>mariadb
<span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>mariadb
<span class="gp">$ </span>mkdir<span class="w"> </span>build
<span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>build
<span class="gp">$ </span>cmake<span class="w"> </span>-DCMAKE_BUILD_TYPE<span class="o">=</span>Debug<span class="w"> </span>..
<span class="gp">$ </span>make<span class="w"> </span>-j8
</pre></div>
<p>This takes a while. When I was hacking on Postgres (a C project), it
took 1 minute on my beefy Linux server to build. It took 20-30 minutes
to build MySQL/MariaDB from scratch. That's C++ for you!</p>
<p>Thankfully incremental builds of MySQL/MariaDB for a tweak after the
initial build take roughly the same time as incremental builds of
Postgres after a tweak.</p>
<p>Once the build is done, create a database.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>./build/scripts/mariadb-install-db<span class="w"> </span>--srcdir<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="w"> </span>--datadir<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/db
</pre></div>
<p>And create a config for the database.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"[client]</span>
<span class="go">socket=$(pwd)/mariadb.sock</span>
<span class="go">[mariadb]</span>
<span class="go">socket=$(pwd)/mariadb.sock</span>
<span class="go">basedir=$(pwd)</span>
<span class="go">datadir=$(pwd)/db</span>
<span class="go">pid-file=$(pwd)/db.pid" > my.cnf</span>
</pre></div>
<p>Start up the server.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>./build/sql/mariadbd<span class="w"> </span>--defaults-extra-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/my.cnf<span class="w"> </span>--debug:d:o,<span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/db.debug
<span class="go">./build/sql/mariadbd: Can't create file '/var/log/mariadb/mariadb.log' (errno: 13 "Permission denied")</span>
<span class="go">2024-01-03 17:10:15 0 [Note] Starting MariaDB 11.4.0-MariaDB-debug source revision 3fad2b115569864d8c1b7ea90ce92aa895cfef08 as process 185550</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.13</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Number of transaction pools: 1</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Completed initialization of buffer pool</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: End of log at LSN=57155</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Opened 3 undo tablespaces</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: log sequence number 57155; transaction id 16</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Loading buffer pool(s) from ./db/ib_buffer_pool</span>
<span class="go">2024-01-03 17:10:15 0 [Note] Plugin 'FEEDBACK' is disabled.</span>
<span class="go">2024-01-03 17:10:15 0 [Note] Plugin 'wsrep-provider' is disabled.</span>
<span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Buffer pool(s) load completed at 240103 17:10:15</span>
<span class="go">2024-01-03 17:10:15 0 [Note] Server socket created on IP: '0.0.0.0'.</span>
<span class="go">2024-01-03 17:10:15 0 [Note] Server socket created on IP: '::'.</span>
<span class="go">2024-01-03 17:10:15 0 [Note] mariadbd: Event Scheduler: Loaded 0 events</span>
<span class="go">2024-01-03 17:10:15 0 [Note] ./build/sql/mariadbd: ready for connections.</span>
<span class="go">Version: '11.4.0-MariaDB-debug' socket: './mariadb.sock' port: 3306 Source distribution</span>
</pre></div>
<p class="note">
With that <code>--debug</code> flag, debug logs will show up in
<code>$(pwd)/db.debug</code>. It's unclear why debug logs are
treated separately from the console logs shown here. I'd rather them
all be in one place.
</p><p>In another terminal, run a client and make a request!</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>./build/client/mariadb<span class="w"> </span>--defaults-extra-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/my.cnf<span class="w"> </span>--database<span class="o">=</span><span class="nb">test</span>
<span class="go">Reading table information for completion of table and column names</span>
<span class="go">You can turn off this feature to get a quicker startup with -A</span>
<span class="go">Welcome to the MariaDB monitor. Commands end with ; or \g.</span>
<span class="go">Your MariaDB connection id is 3</span>
<span class="go">Server version: 11.4.0-MariaDB-debug Source distribution</span>
<span class="go">Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.</span>
<span class="go">Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.</span>
<span class="go">MariaDB [test]> SELECT 1;</span>
<span class="go">+---+</span>
<span class="go">| 1 |</span>
<span class="go">+---+</span>
<span class="go">| 1 |</span>
<span class="go">+---+</span>
<span class="go">1 row in set (0.001 sec)</span>
</pre></div>
<p>Huzzah! Let's write a custom storage engine!</p>
<h3 id="where-does-the-code-go?">Where does the code go?</h3><p>When writing an extension for some project, I usually expect to have
the extension exist in its own repo. I was able to do this with the
<a href="https://github.com/eatonphil/pgtam">Postgres in-memory storage engine I
wrote</a>. And in general, Postgres
extensions exist as their own repos.</p>
<p>I was able to create and build a UDF plugin outside the MariaDB source
tree. But when it came to getting a storage engine to build and load
successfully, I wasted almost an entire day (a large amount of time in
a single hack week) getting nowhere.</p>
<p>Extensions for MySQL/MariaDB are most easily built via the CMake
infrastructure within the repo. Surely there's <em>some</em> way to replicate
that infrastructure from outside the repo but I wasn't able to figure
it out within a day and didn't want to spend more time on it.</p>
<p>Apparently the <a href="https://twitter.com/kastauyra/status/1743346665442935174">normal thing to
do</a> in
MySQL/MariaDB is to keep extensions within a fork of MySQL/MariaDB.</p>
<p>When I switched to this method I was able to very quickly get the
storage engine building and loaded. So that's what we'll do.</p>
<h3 id="boilerplate">Boilerplate</h3><p>Within the MariaDB source tree, create a new folder in the <code>storage</code>
subdirectory.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>mkdir<span class="w"> </span>storage/memem
</pre></div>
<p>Within <code>storage/memem/CMakeLists.txt</code> add the following.</p>
<div class="highlight"><pre><span></span><span class="c"># Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved.</span>
<span class="c"># </span>
<span class="c"># This program is free software; you can redistribute it and/or modify</span>
<span class="c"># it under the terms of the GNU General Public License as published by</span>
<span class="c"># the Free Software Foundation; version 2 of the License.</span>
<span class="c"># </span>
<span class="c"># This program is distributed in the hope that it will be useful,</span>
<span class="c"># but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span class="c"># MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span>
<span class="c"># GNU General Public License for more details.</span>
<span class="c"># </span>
<span class="c"># You should have received a copy of the GNU General Public License</span>
<span class="c"># along with this program; if not, write to the Free Software</span>
<span class="c"># Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA</span>
<span class="nb">SET</span><span class="p">(</span><span class="s">MEMEM_SOURCES</span><span class="w"> </span><span class="s">ha_memem.cc</span><span class="w"> </span><span class="s">ha_memem.h</span><span class="p">)</span>
<span class="nb">MYSQL_ADD_PLUGIN</span><span class="p">(</span><span class="s">memem</span><span class="w"> </span><span class="o">${</span><span class="nv">MEMEM_SOURCES</span><span class="o">}</span><span class="w"> </span><span class="s">STORAGE_ENGINE</span><span class="p">)</span>
</pre></div>
<p>This hooks into MySQL/MariaDB build infrastructure. So next time you
run <code>make</code> within the <code>build</code> directory we created above, it will also
build this project.</p>
<h3 id="the-storage-engine-class">The storage engine class</h3><p>It would be nice to see a way to extend MySQL in C (for one, because
it would then be easier to port to other languages). But all of the
builtin storage methods use classes. So we'll do that too.</p>
<p>The class we must implement is an instance of
<a href="https://github.com/MariaDB/server/blob/11.4/sql/handler.h#L3200"><code>handler</code></a>. There
is a single <code>handler</code> instance per thread, corresponding to a single
running query. (Postgres gives each query its own process, MySQL gives
each query its own thread.) However, <code>handler</code> instances are reused
across different queries.</p>
<p>There are a number of virtual methods on <code>handler</code> we must implement
in our subclass. For most of them we'll do nothing: simply returning
immediately. These simple methods will be implemented in
<code>ha_memem.h</code>. The methods with more complex logic will be implemented
in <code>ha_memem.cc</code>.</p>
<p>Let's set up includes in <code>ha_memem.h</code>.</p>
<div class="highlight"><pre><span></span><span class="cm">/* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.</span>
<span class="cm"> This program is free software; you can redistribute it and/or modify</span>
<span class="cm"> it under the terms of the GNU General Public License as published by</span>
<span class="cm"> the Free Software Foundation; version 2 of the License.</span>
<span class="cm"> This program is distributed in the hope that it will be useful,</span>
<span class="cm"> but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span class="cm"> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span>
<span class="cm"> GNU General Public License for more details.</span>
<span class="cm"> You should have received a copy of the GNU General Public License</span>
<span class="cm"> along with this program; if not, write to the Free Software</span>
<span class="cm"> Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */</span>
<span class="cp">#ifdef USE_PRAGMA_INTERFACE</span>
<span class="cp">#pragma interface </span><span class="cm">/* gcc class implementation */</span>
<span class="cp">#endif</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"thr_lock.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"handler.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"table.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"sql_const.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><vector></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><memory></span>
</pre></div>
<p>Next we'll define structs for our in-memory storage.</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">uchar</span><span class="o">></span><span class="w"> </span><span class="n">MememRow</span><span class="p">;</span>
<span class="k">struct</span><span class="w"> </span><span class="nc">MememTable</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">MememRow</span><span class="o">>></span><span class="w"> </span><span class="n">rows</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">name</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span><span class="w"> </span><span class="nc">MememDatabase</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">MememTable</span><span class="o">>></span><span class="w"> </span><span class="n">tables</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>Within <code>ha_memem.cc</code> we'll implement a global (not thread-safe)
<code>static MememDatabase*</code> that all <code>handler</code> instances will query when
requested. We need the definitions in the header file though because
we'll store the table currently being queried in the <code>handler</code>
subclass.</p>
<p>This is so that every call to <code>write_row</code> to write a single row or
call to <code>rnd_next</code> to read a single row does not need to look up the
in-memory table object N times within the same query.</p>
<p>And finally we'll define the subclass of <code>handler</code> and implementations
of trivial methods.</p>
<div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">ha_memem</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="k">public</span><span class="w"> </span><span class="n">handler</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">current_position</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">MememTable</span><span class="o">></span><span class="w"> </span><span class="n">memem_table</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="k">public</span><span class="o">:</span>
<span class="w"> </span><span class="n">ha_memem</span><span class="p">(</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE_SHARE</span><span class="w"> </span><span class="o">*</span><span class="n">table_arg</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">handler</span><span class="p">(</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">table_arg</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">~</span><span class="n">ha_memem</span><span class="p">()</span><span class="o">=</span><span class="w"> </span><span class="k">default</span><span class="p">;</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="nf">index_type</span><span class="p">(</span><span class="n">uint</span><span class="w"> </span><span class="n">key_number</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">ulonglong</span><span class="w"> </span><span class="nf">table_flags</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">ulong</span><span class="w"> </span><span class="nf">index_flags</span><span class="p">(</span><span class="n">uint</span><span class="w"> </span><span class="n">inx</span><span class="p">,</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">part</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">all_parts</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="cm">/* The following defines can be increased if necessary */</span>
<span class="cp">#define MEMEM_MAX_KEY MAX_KEY </span><span class="cm">/* Max allowed keys */</span>
<span class="cp">#define MEMEM_MAX_KEY_SEG 16 </span><span class="cm">/* Max segments for key */</span>
<span class="cp">#define MEMEM_MAX_KEY_LENGTH 3500 </span><span class="cm">/* Like in InnoDB */</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="nf">max_supported_keys</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">MEMEM_MAX_KEY</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="nf">max_supported_key_length</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">MEMEM_MAX_KEY_LENGTH</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="nf">max_supported_key_part_length</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">MEMEM_MAX_KEY_LENGTH</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">open</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">mode</span><span class="p">,</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">test_if_locked</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">close</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">truncate</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">rnd_init</span><span class="p">(</span><span class="kt">bool</span><span class="w"> </span><span class="n">scan</span><span class="p">);</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">rnd_next</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">);</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">rnd_pos</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_read_map</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">key_part_map</span><span class="w"> </span><span class="n">keypart_map</span><span class="p">,</span>
<span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="nc">ha_rkey_function</span><span class="w"> </span><span class="n">find_flag</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_read_idx_map</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">idx</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span>
<span class="w"> </span><span class="n">key_part_map</span><span class="w"> </span><span class="n">keypart_map</span><span class="p">,</span>
<span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="nc">ha_rkey_function</span><span class="w"> </span><span class="n">find_flag</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_read_last_map</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span>
<span class="w"> </span><span class="n">key_part_map</span><span class="w"> </span><span class="n">keypart_map</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_next</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_prev</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_first</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_last</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">position</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">record</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">info</span><span class="p">(</span><span class="n">uint</span><span class="w"> </span><span class="n">flag</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">external_lock</span><span class="p">(</span><span class="n">THD</span><span class="w"> </span><span class="o">*</span><span class="n">thd</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">lock_type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">create</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE</span><span class="w"> </span><span class="o">*</span><span class="n">table_arg</span><span class="p">,</span><span class="w"> </span><span class="n">HA_CREATE_INFO</span><span class="w"> </span><span class="o">*</span><span class="n">create_info</span><span class="p">);</span>
<span class="w"> </span><span class="n">THR_LOCK_DATA</span><span class="w"> </span><span class="o">**</span><span class="nf">store_lock</span><span class="p">(</span><span class="n">THD</span><span class="w"> </span><span class="o">*</span><span class="n">thd</span><span class="p">,</span><span class="w"> </span><span class="n">THR_LOCK_DATA</span><span class="w"> </span><span class="o">**</span><span class="n">to</span><span class="p">,</span>
<span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="nc">thr_lock_type</span><span class="w"> </span><span class="n">lock_type</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">to</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">delete_table</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="n">reset_memem_table</span><span class="p">();</span>
<span class="w"> </span><span class="k">virtual</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">write_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">);</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">update_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">old_data</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">new_data</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_WRONG_COMMAND</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">delete_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_WRONG_COMMAND</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>A complete storage engine might seriously implement all of these
methods. But we'll only seriously implement 7 of them.</p>
<p>To finish up the boilerplate, we'll switch over to <code>ha_memem.cc</code> and
set up the includes.</p>
<div class="highlight"><pre><span></span><span class="cm">/* Copyright (c) 2005, 2012, Oracle and/or its affiliates. All rights reserved.</span>
<span class="cm"> This program is free software; you can redistribute it and/or modify</span>
<span class="cm"> it under the terms of the GNU General Public License as published by</span>
<span class="cm"> the Free Software Foundation; version 2 of the License.</span>
<span class="cm"> This program is distributed in the hope that it will be useful,</span>
<span class="cm"> but WITHOUT ANY WARRANTY; without even the implied warranty of</span>
<span class="cm"> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span>
<span class="cm"> GNU General Public License for more details.</span>
<span class="cm"> You should have received a copy of the GNU General Public License</span>
<span class="cm"> along with this program; if not, write to the Free Software</span>
<span class="cm"> Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */</span>
<span class="cp">#ifdef USE_PRAGMA_IMPLEMENTATION</span>
<span class="cp">#pragma implementation </span><span class="c1">// gcc: Class implementation</span>
<span class="cp">#endif</span>
<span class="cp">#define MYSQL_SERVER 1</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><my_global.h></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"sql_priv.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"unireg.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"sql_class.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"ha_memem.h"</span>
</pre></div>
<p>Ok! Let's dig into the implementation.</p>
<h3 id="implementation">Implementation</h3><h4 id="the-global-database">The global database</h4><p>First up, we need to declare a global <code>MememDatabase*</code> instance. We'll
also implement a helper function for finding the index of a table by
name within the database.</p>
<div class="highlight"><pre><span></span><span class="c1">// WARNING! All accesses of `database` in this code are thread</span>
<span class="c1">// unsafe. Since this was written during a hack week, I didn't have</span>
<span class="c1">// time to figure out MySQL/MariaDB's runtime well enough to do the</span>
<span class="c1">// thread-safe version of this.</span>
<span class="k">static</span><span class="w"> </span><span class="n">MememDatabase</span><span class="w"> </span><span class="o">*</span><span class="n">database</span><span class="p">;</span>
<span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">memem_table_index</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">INT_MAX</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">name</span><span class="o">-></span><span class="n">c_str</span><span class="p">(),</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">-1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p class="note">
As I wrote this post I noticed that this code also assumes there's
only a single database. That isn't how MySQL works. Everytime you
call <code>USE ...</code> in MySQL you are switching between
databases. You can query tables across databases. A real in-memory
backend would need to be aware of the different databases, not just
different tables. But to keep the code succinct we won't implement
that in this post.
</p><p>Next we'll implement plugin initialization and cleanup.</p>
<h4 id="plugin-lifecycle">Plugin lifecycle</h4><p>Before we register the plugin with MariaDB, we need to set up
initialization and cleanup methods for it.</p>
<p>The initialization method will take care of initializing the global
<code>MememDatabase* database</code> object. It will set up a handler for
creating new instances of our <code>handler</code> subclass. And it will set up a
handler for deleting tables.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="o">*</span><span class="nf">memem_create_handler</span><span class="p">(</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE_SHARE</span><span class="w"> </span><span class="o">*</span><span class="n">table</span><span class="p">,</span>
<span class="w"> </span><span class="n">MEM_ROOT</span><span class="w"> </span><span class="o">*</span><span class="n">mem_root</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="p">(</span><span class="n">mem_root</span><span class="p">)</span><span class="w"> </span><span class="n">ha_memem</span><span class="p">(</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">memem_init</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="n">memem_hton</span><span class="p">;</span>
<span class="w"> </span><span class="n">memem_hton</span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">p</span><span class="p">;</span>
<span class="w"> </span><span class="n">memem_hton</span><span class="o">-></span><span class="n">db_type</span><span class="o">=</span><span class="w"> </span><span class="n">DB_TYPE_AUTOASSIGN</span><span class="p">;</span>
<span class="w"> </span><span class="n">memem_hton</span><span class="o">-></span><span class="n">create</span><span class="o">=</span><span class="w"> </span><span class="n">memem_create_handler</span><span class="p">;</span>
<span class="w"> </span><span class="n">memem_hton</span><span class="o">-></span><span class="n">drop_table</span><span class="o">=</span><span class="w"> </span><span class="p">[](</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="o">=</span><span class="w"> </span><span class="n">memem_table_index</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_NO_SUCH_TABLE</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">erase</span><span class="p">(</span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">begin</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">"info"</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">"[MEMEM] Deleted table '%s'."</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">memem_hton</span><span class="o">-></span><span class="n">flags</span><span class="o">=</span><span class="w"> </span><span class="n">HTON_CAN_RECREATE</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Initialize global in-memory database.</span>
<span class="w"> </span><span class="n">database</span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">MememDatabase</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p class="note">
The <code>DBUG_PRINT</code> macro is a debug helper MySQL/MariaDB gives us. As
noted above, the output is directed to a file specified by the
<code>--debug</code> flag. Unfortunately I couldn't figure out how to flush the
stream this macro writes to. It seemed like occasionally when there
was a segfault logs I expected to be there weren't there. And the
file would often contain what looked like partially written
logs. Anyway, as long as there wasn't a segfault the debug file will
eventually contain the <code>DBUG_PRINT</code> logs.
</p><p>The only thing the plugin cleanup function must do is delete the
global database.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">memem_fini</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">database</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Now we can register the plugin!</p>
<h4 id="plugin-registration">Plugin registration</h4><p>The <code>maria_declare_plugin</code> and <code>maria_declare_plugin_end</code> register the
plugin's metadata (name, version, etc.) and callbacks.</p>
<div class="highlight"><pre><span></span><span class="k">struct</span><span class="w"> </span><span class="nc">st_mysql_storage_engine</span><span class="w"> </span><span class="n">memem_storage_engine</span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">MYSQL_HANDLERTON_INTERFACE_VERSION</span><span class="p">};</span>
<span class="n">maria_declare_plugin</span><span class="p">(</span><span class="n">memem</span><span class="p">){</span>
<span class="w"> </span><span class="n">MYSQL_STORAGE_ENGINE_PLUGIN</span><span class="p">,</span>
<span class="w"> </span><span class="o">&</span><span class="n">memem_storage_engine</span><span class="p">,</span>
<span class="w"> </span><span class="s">"MEMEM"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"MySQL AB"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"In-memory database."</span><span class="p">,</span>
<span class="w"> </span><span class="n">PLUGIN_LICENSE_GPL</span><span class="p">,</span>
<span class="w"> </span><span class="n">memem_init</span><span class="p">,</span><span class="w"> </span><span class="cm">/* Plugin Init */</span>
<span class="w"> </span><span class="n">memem_fini</span><span class="p">,</span><span class="w"> </span><span class="cm">/* Plugin Deinit */</span>
<span class="w"> </span><span class="mh">0x0100</span><span class="w"> </span><span class="cm">/* 1.0 */</span><span class="p">,</span>
<span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="cm">/* status variables */</span>
<span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="cm">/* system variables */</span>
<span class="w"> </span><span class="s">"1.0"</span><span class="p">,</span><span class="w"> </span><span class="cm">/* string version */</span>
<span class="w"> </span><span class="n">MariaDB_PLUGIN_MATURITY_STABLE</span><span class="w"> </span><span class="cm">/* maturity */</span>
<span class="p">}</span><span class="w"> </span><span class="n">maria_declare_plugin_end</span><span class="p">;</span>
</pre></div>
<p>That's it! Now we need to implement methods for writing rows, reading
rows, and creating a new table.</p>
<h4 id="create-table">Create table</h4><p>To create a table, we make sure one by this name doesn't already
exist, make sure it only has <code>INTEGER</code> fields, allocate memory for the
table, and append it to the global database.</p>
<div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::create</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE</span><span class="w"> </span><span class="o">*</span><span class="n">table_arg</span><span class="p">,</span>
<span class="w"> </span><span class="n">HA_CREATE_INFO</span><span class="w"> </span><span class="o">*</span><span class="n">create_info</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">memem_table_index</span><span class="p">(</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// We only support INTEGER fields for now.</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">table_arg</span><span class="o">-></span><span class="n">field</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">table_arg</span><span class="o">-></span><span class="n">field</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-></span><span class="n">type</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MYSQL_TYPE_LONG</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">"info"</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">"Unsupported field type."</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o"><</span><span class="n">MememTable</span><span class="o">></span><span class="p">();</span>
<span class="w"> </span><span class="n">t</span><span class="o">-></span><span class="n">name</span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">t</span><span class="p">);</span>
<span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">"info"</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">"[MEMEM] Created table '%s'."</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Not very complicated. Let's handle <code>INSERT</code>-ing rows next.</p>
<h4 id="insert-row">Insert row</h4><p>There is no method called when an <code>INSERT</code> starts. There is a <code>table</code>
field on the <code>handler</code> parent class that is updated though when a
<code>SELECT</code> or <code>INSERT</code> is going. So we can fetch the current table from
that field.</p>
<p>Since we have a slot for a <code>std::shared_ptr<MememTable> memem_table</code>
on the <code>ha_memem</code> class, we can check if it is <code>NULL</code> when we insert a
row. If it is, we look up the current table and set
<code>this->memem_table</code> to its <code>MememTable</code>.</p>
<p>But there's a bit more to it than just the table name. The <code>const
char* name</code> passed to the <code>create()</code> method above seems to be a sort
of fully qualified name for the table. By observation, when creating a
table <code>y</code> in a database <code>test</code>, the <code>const char* name</code> value is
<code>./test/y</code>. The <code>.</code> prefix probably means that the database is local,
but I'm not sure.</p>
<p>So we'll write a helper method that will reconstruct the fully
qualified table name before looking up that fully qualified table name
in the global database.</p>
<div class="highlight"><pre><span></span><span class="kt">void</span><span class="w"> </span><span class="nf">ha_memem::reset_memem_table</span><span class="p">()</span>
<span class="p">{</span>
<span class="w"> </span><span class="c1">// Reset table cursor.</span>
<span class="w"> </span><span class="n">current_position</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">full_name</span><span class="o">=</span><span class="w"> </span><span class="s">"./"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="n">table</span><span class="o">-></span><span class="n">s</span><span class="o">-></span><span class="n">db</span><span class="p">.</span><span class="n">str</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"/"</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="n">table</span><span class="o">-></span><span class="n">s</span><span class="o">-></span><span class="n">table_name</span><span class="p">.</span><span class="n">str</span><span class="p">);</span>
<span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">"info"</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">"[MEMEM] Resetting to '%s'."</span><span class="p">,</span><span class="w"> </span><span class="n">full_name</span><span class="p">.</span><span class="n">c_str</span><span class="p">()));</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="o">=</span><span class="w"> </span><span class="n">memem_table_index</span><span class="p">(</span><span class="n">full_name</span><span class="p">.</span><span class="n">c_str</span><span class="p">());</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">());</span>
<span class="w"> </span><span class="n">memem_table</span><span class="o">=</span><span class="w"> </span><span class="n">database</span><span class="o">-></span><span class="n">tables</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="p">}</span>
</pre></div>
<p>Then we can use this within <code>write_row</code> to figure out the current
<code>MememTable</code> being queried.</p>
<p>But first, let's digress into how MySQL stores rows.</p>
<h4 id="the-mysql-row-api">The MySQL row API</h4><p>When you <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">write a Postgres custom storage
API</a>,
you are expected to basically read from or write to an array of
<code>Datum</code>.</p>
<p>Totally sensible.</p>
<p>In MySQL, you read from and write to an array of bytes. That's pretty
weird to me. Of course you can build your own higher level
serialization/deserialization on top of it. But it's just strange to
me everyone has to know this basically opaque API.</p>
<p>Certainly <a href="https://github.com/MariaDB/server/blob/11.4/sql/handler.h#L3152">it's documented</a>.</p>
<div class="highlight"><pre><span></span>The handler class is the interface for dynamically loadable
storage engines. Do not add ifdefs and take care when adding or
changing virtual functions to avoid vtable confusion
Functions in this class accept and return table columns data. Two data
representation formats are used:
1. TableRecordFormat - Used to pass [partial] table records to/from
storage engine
2. KeyTupleFormat - used to pass index search tuples (aka "keys") to
storage engine. See opt_range.cc for description of this format.
TableRecordFormat
=================
[Warning: this description is work in progress and may be incomplete]
The table record is stored in a fixed-size buffer:
record: null_bytes, column1_data, column2_data, ...
The offsets of the parts of the buffer are also fixed: every column has
an offset to its column{i}_data, and if it is nullable it also has its own
bit in null_bytes.
</pre></div>
<p>In our implementation, we'll skip the support for <code>NULL</code> values. We'll
only support <code>INTEGER</code> fields. But we still need to be aware that the
first byte will be taken up. We'll also assume there won't be more
than one byte of a NULL bitmap.</p>
<p>It is this opaque byte array that we'll read from in <code>write_row(const uchar*
buf)</code> and write to in <code>read_row(uchar* buf)</code>.</p>
<h4 id="insert-row-(take-two)">Insert row (take two)</h4><p>To keep things simple we're going to store the row in <code>MememTable</code> the
same way MySQL passes it around.</p>
<div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::write_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">memem_table</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">reset_memem_table</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Assume there are no NULLs.</span>
<span class="w"> </span><span class="n">buf</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">field_count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">table</span><span class="o">-></span><span class="n">field</span><span class="p">[</span><span class="n">field_count</span><span class="p">])</span><span class="w"> </span><span class="n">field_count</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Store the row in the same format MariaDB gives us.</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">row</span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">uchar</span><span class="o">>></span><span class="p">(</span>
<span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">field_count</span><span class="p">);</span>
<span class="w"> </span><span class="n">memem_table</span><span class="o">-></span><span class="n">rows</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">row</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Which makes reading the row quite simple too!</p>
<h4 id="read-row">Read row</h4><p>The only slight difference between reading and writing a row is that
MySQL/MariaDB will tell us when the <code>SELECT</code> scan for a table starts.</p>
<p>We'll use that opportunity to reset the <code>current_row</code> cursor and reset
the <code>memem_table</code> field. Since, again, <code>handler</code> classes are only used
once per query but they are reused for queries running at other times.</p>
<div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::rnd_init</span><span class="p">(</span><span class="kt">bool</span><span class="w"> </span><span class="n">scan</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">reset_memem_table</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::rnd_next</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">current_position</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">memem_table</span><span class="o">-></span><span class="n">rows</span><span class="p">.</span><span class="n">size</span><span class="p">())</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Reset the in-memory table to make logic errors more obvious.</span>
<span class="w"> </span><span class="n">memem_table</span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">current_position</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">memem_table</span><span class="o">-></span><span class="n">rows</span><span class="p">.</span><span class="n">size</span><span class="p">());</span>
<span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">ptr</span><span class="o">=</span><span class="w"> </span><span class="n">buf</span><span class="p">;</span>
<span class="w"> </span><span class="o">*</span><span class="n">ptr</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">ptr</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Rows internally are stored in the same format that MariaDB</span>
<span class="w"> </span><span class="c1">// wants. So we can just copy them over.</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">uchar</span><span class="o">>></span><span class="w"> </span><span class="n">row</span><span class="o">=</span><span class="w"> </span><span class="n">memem_table</span><span class="o">-></span><span class="n">rows</span><span class="p">[</span><span class="n">current_position</span><span class="p">];</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">copy</span><span class="p">(</span><span class="n">row</span><span class="o">-></span><span class="n">begin</span><span class="p">(),</span><span class="w"> </span><span class="n">row</span><span class="o">-></span><span class="n">end</span><span class="p">(),</span><span class="w"> </span><span class="n">ptr</span><span class="p">);</span>
<span class="w"> </span><span class="n">current_position</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And we're done!</p>
<h3 id="build-and-test">Build and test</h3><p>Go back into the <code>build</code> directory we created within the source tree
root and rerun <code>make -j8</code>.</p>
<p>Kill the server (you'll need to do something like <code>killall mariadbd</code>
since the server doesn't respond to Ctrl-c). And restart it.</p>
<p>For some reason this plugin doesn't need to be loaded. We can run
<code>SHOW PLUGINS;</code> in the MariaDB CLI and we'll see it.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>./build/client/mariadb<span class="w"> </span>--defaults-extra-file<span class="o">=</span>/home/phil/vendor/mariadb/my.cnf<span class="w"> </span>--database<span class="o">=</span><span class="nb">test</span>
<span class="go">Reading table information for completion of table and column names</span>
<span class="go">You can turn off this feature to get a quicker startup with -A</span>
<span class="go">Welcome to the MariaDB monitor. Commands end with ; or \g.</span>
<span class="go">Your MariaDB connection id is 5</span>
<span class="go">Server version: 11.4.0-MariaDB-debug Source distribution</span>
<span class="go">Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.</span>
<span class="go">Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.</span>
<span class="go">MariaDB [test]> SHOW PLUGINS;</span>
<span class="go">+-------------------------------+----------+--------------------+-----------------+---------+</span>
<span class="go">| Name | Status | Type | Library | License |</span>
<span class="go">+-------------------------------+----------+--------------------+-----------------+---------+</span>
<span class="go">| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |</span>
<span class="go">...</span>
<span class="go">| MEMEM | ACTIVE | STORAGE ENGINE | NULL | GPL |</span>
<span class="go">...</span>
<span class="go">| BLACKHOLE | ACTIVE | STORAGE ENGINE | ha_blackhole.so | GPL |</span>
<span class="go">+-------------------------------+----------+--------------------+-----------------+---------+</span>
<span class="go">73 rows in set (0.012 sec)</span>
</pre></div>
<p>There we go! To create a table with it we need to set <code>ENGINE =
MEMEM</code>. For example, <code>CREATE TABLE x (i INT) ENGINE = MEMEM</code>.</p>
<p>Let's create a script to try out the <code>memem</code> engine, in
<code>storage/memem/test.sql</code>.</p>
<div class="highlight"><pre><span></span><span class="k">drop</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">exists</span><span class="w"> </span><span class="n">y</span><span class="p">;</span>
<span class="k">drop</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">exists</span><span class="w"> </span><span class="n">z</span><span class="p">;</span>
<span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">y</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">engine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MEMEM</span><span class="p">;</span>
<span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">1029</span><span class="p">);</span>
<span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">92</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">);</span>
<span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span>
<span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">z</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">engine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MEMEM</span><span class="p">;</span>
<span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">322</span><span class="p">);</span>
<span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="p">);</span>
<span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">20</span><span class="p">;</span>
</pre></div>
<p>And run it.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>./build/client/mariadb<span class="w"> </span>--defaults-extra-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/my.cnf<span class="w"> </span>--database<span class="o">=</span><span class="nb">test</span><span class="w"> </span>--table<span class="w"> </span>--verbose<span class="w"> </span><<span class="w"> </span>storage/memem/test.sql
<span class="go">--------------</span>
<span class="go">drop table if exists y</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">drop table if exists z</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">create table y(i int, j int) engine = MEMEM</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">insert into y values (2, 1029)</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">insert into y values (92, 8)</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">select * from y where i + 8 = 10</span>
<span class="go">--------------</span>
<span class="go">+------+------+</span>
<span class="go">| i | j |</span>
<span class="go">+------+------+</span>
<span class="go">| 2 | 1029 |</span>
<span class="go">+------+------+</span>
<span class="go">--------------</span>
<span class="go">create table z(a int) engine = MEMEM</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">insert into z values (322)</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">insert into z values (8)</span>
<span class="go">--------------</span>
<span class="go">--------------</span>
<span class="go">select * from z where a > 20</span>
<span class="go">--------------</span>
<span class="go">+------+</span>
<span class="go">| a |</span>
<span class="go">+------+</span>
<span class="go">| 322 |</span>
<span class="go">+------+</span>
</pre></div>
<p>What you see there is the power of storage engines! It supports the
full SQL language even while we implemented storage somewhere
completely different than the default.</p>
<h3 id="in-memory-is-boring">In-memory is boring</h3><p>Certainly, I'm getting bored doing the same project over and over
again on different databases. However, it's minimal projects like this
that make it super easy to then go and port the storage engine to
something else.</p>
<p>The goal here is to be minimal but meaningful. And I've accomplished
that for myself at least!</p>
<h3 id="on-chatgpt">On ChatGPT</h3><p>As I've <a href="https://notes.eatonphil.com/2023-11-19-exploring-a-postgres-query-plan.html#postscript:-on-chatgpt">written
before</a>,
this sort of exploration wouldn't be possible within the time frame I
gave myself if it weren't for ChatGPT. Specifically, the paid tier
GPT4.</p>
<p>Neither the MySQL nor the MariaDB docs were so helpful that I could
immediately figure out things like how to get the current table name
within a scan (the <code>table</code> member of the <code>handler</code> class).</p>
<p>With ChatGPT you can ask questions like: "In a MySQL C++ plugin, how
do I get the name of the table from a <code>handler</code> class as a C
string?". Sometimes it's right and sometime's it's not. But you can
try out the code and if it builds it is at least somewhat correct!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post walking you through building a super minimal in-memory storage engine for MySQL/MariaDB in 218 lines of C++.<br><br>And took time again to reflect on the limitations of custom storage engines and how MySQL compares to Postgres internally here.<a href="https://t.co/nImUC36DPs">https://t.co/nImUC36DPs</a> <a href="https://t.co/1Oj2Lcua8O">pic.twitter.com/1Oj2Lcua8O</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1744822526088282587?ref_src=twsrc%5Etfw">January 9, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2024-01-09-minimal-in-memory-storage-engine-for-mysql.htmlTue, 09 Jan 2024 00:00:00 +0000
- Make your own wayhttp://notes.eatonphil.com/2023-12-26-make-your-own-way.html<p>Over the years, I have repeatedly felt like I missed the timing for a
meetup or an IRC group or social media in general. I'd go to a meetup
every so often but I'd never make a meaningful connection with people,
whereas everyone else knew each other. I'd join an IRC group and have
difficulty catching up with what seemed to be the flow of
conversation.</p>
<p>I hadn't thought much about this until the pandemic when I started a
<a href="https://eatonphil.com/discord.html">Discord group for software
internals</a> and a virtual tech talk
series called Hacker Nights. Since 2021 the Discord reached around
1,500 members and ~20 fairly active members. And the Meetup peaked at
about 300 members with about 10-20 showing up each Meetup.</p>
<p>After the pandemic receded I started an <a href="https://eatonphil.com/2023-ddia.html">NYC-based book
club</a> over 2 months with about
5-8 active attendees. I ran a <a href="https://eatonphil.com/2023-10-wehack-postgres.html">virtual hack week on
Discord</a> where I
got ~100 devs into a temporary Discord server and we talked about
Postgres internals and shared resources. Ultimately around 5 of us wrote blog
posts and built new projects to explore Postgres.</p>
<p>I started a <a href="https://eatonphil.com/2023-database-internals.html">virtual, async email book
club</a> (that is
still ongoing) with 300 devs from November 2023 to Feb 2024. There
have been around 20 active members of the club. And each week the
discussion is kicked off by one of the members, not myself.</p>
<p>And I felt like there wasn't enough community opportunity for folks in
systems programming in NYC so I started an <a href="https://eatonphil.com/nyc-systems-coffee-club.html">Manhattan-based Systems
Coffee
Club</a>. Around 15
people showed up to the first meeting and seemed even more excited
about it than I was. (And I was excited!) We'll see where it goes
from here.</p>
<p>Organizing people to do this stuff doesn't come easy to me. I enjoy
doing it to a degree, but every night before an event I have trouble
sleeping. Worried about embarrassing myself. When the event happens
though, and people are happy to be there to chat with everyone else,
as they invariably have been, it makes it worthwhile.</p>
<h3 id="everyone-want-community">Everyone want community</h3><p>Something I realized along the way is that people (maybe devs
especially, I don't know) are looking for community. And when I have
noticed there seems to be a missing flashpoint (a topic, a career
focus, a book, etc.) for community, it's been pretty easy to get
people together around it.</p>
<h3 id="the-lifecycle-of-groups">The lifecycle of groups</h3><p>Groups, meetups, naturally live and die. Organizers get burnt out. I
don't see this as a problem. It's just the way it is.</p>
<p>At some point I'll get burnt out too. Or I'll get pickier. For
example, I've been avoiding starting a systems programming meetup in
NYC because I know it will be a big effort. So I've done lower effort
groups like book clubs and coffee clubs.</p>
<p>Don't worry about signing yourself up for indefinite work. Just do
whatever you'd like to and don't feel bad if you have to stop. Someone
else will eventually start the next great group, even if it comes in a
different medium or flavor.</p>
<h3 id="community-is-contagious">Community is contagious</h3><p>There are great communities out there that have inspired me.</p>
<ul>
<li>Aleksey Charapko's and Murat Demirbas's virtual
<a href="https://charap.co/reading-group/">Distributed Systems Reading Group</a></li>
<li>Alex Petrov's <a href="https://twitter.com/ifesdjeen">database paper reading group</a></li>
<li>Andy Pavlo's <a href="https://db.cs.cmu.edu/seminar2023/">database interview series</a></li>
<li>Paul Butler's <a href="https://browsertech.com/nyc">BrowserTech meetup</a></li>
<li>Eric Zhang's <a href="https://twitter.com/ekzhang1/status/1700993939841716254">New York Systems Reading Group</a></li>
</ul>
<p>And this year I've been hearing about more.</p>
<ul>
<li>TU Munich students <a href="https://www.tumuchdata.club/">started a Student Database Group</a></li>
<li>A group of developers <a href="https://twitter.com/Keleesssss/status/1720466270032691460">starting a Türkiye-language CS reading group</a></li>
</ul>
<p>There are yet a few more systems programming groups I've heard rumors
about being started on the US West Coast and Stockholm.</p>
<h3 id="do-whatever-you-want!">Do whatever you want!</h3><p>If you feel like you can't find the right group or that you don't fit
in with existing groups or that you're missing a moment, there are
surely other folks in the same boat. Waiting for a new group to
join. You may be the catalyst.</p>
<p>There's enormous potential for getting people together and doing
something interesting and there isn't necessarily anyone telling you
you should. Things you try may work and they may not. The more you try
the more you'll learn what works and what doesn't. I've had a few
years of <a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">making mistakes
organizing</a>
to hone the sense.</p>
<p>The only boring thing to do is to necessarily limit yourself to the
sort of thing others have done before! Run a browser meetup instead of
a React meetup. Interview hardware developers to teach software
developers something. Get software developers with 20 years of
experience in niche fields to teach the rest of us something. Read
books beyond SICP or Clean Code. Try difficult programming projects.</p>
<p>Whatever you want though, don't let me deter you. If you think
something should exist, give it a shot!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I used to struggle to get much out of meetups, couldn't pick up the flow of IRC. Some point I stopped trying solely to fit in. Instead to do what I thought was interesting. And to my surprise, folks were interested in coming along too!<br><br>Make your own way<a href="https://t.co/tVEa2ndiZm">https://t.co/tVEa2ndiZm</a> <a href="https://t.co/piWSsv14lj">pic.twitter.com/piWSsv14lj</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1740150745931149471?ref_src=twsrc%5Etfw">December 27, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-12-26-make-your-own-way.htmlWed, 27 Dec 2023 00:00:00 +0000
- Exploring a Postgres query planhttp://notes.eatonphil.com/2023-11-19-exploring-a-postgres-query-plan.html<p><!-- -*- mode: markdown -*- --></p>
<p>I learned this week that you can intercept and redirect Postgres query
execution. You can hook into the execution layer so you're given a
query plan and you get to decide what to do with it. What rows to
return, if any, and where they come from.</p>
<p>That's very interesting. So I started writing code to explore execution
hooks. However, I got stuck interpreting the query plan. Either
there's no query plan walking infrastructure or I just didn't find it.</p>
<p>So this post is a digression into walking a Postgres query plan. By
the end we'll be able to run <code>psql -c 'SELECT a FROM x WHERE a > 1'</code>
and reconstruct the entire SQL string from a Postgres
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/executor/execdesc.h#L33"><code>QueryDesc</code></a>
object, the query plan object Postgres builds.</p>
<p>With that query plan walking infrastructure in place, we'll be in a
good state to not just print out the query plan while walking it but
instead to translate the query plan or evaluate it in our own way
(e.g. over column-wise data, or <a href="https://github.com/citusdata/postgres_vectorization_test">vectorized execution over row-wise
data</a>).</p>
<p>Code for this project is <a href="https://github.com/eatonphil/pgexec">available on
Github</a>.</p>
<h3 id="what-is-a-query-plan?">What is a query plan?</h3><p>If you're familiar with parsers and compilers, a query plan is like an
intermediate representation (IR) of a program. It is not as raw as an
abstract syntax tree (AST); it has already been optimized.</p>
<p>If that doesn't mean anything to you, think of a query plan as a
structured and optimized version of the SQL query you submit to your
database. It isn't text anymore. It is <a href="https://buttondown.email/jaffray/archive/why-are-query-plans-trees/">probably a
tree</a>.</p>
<p>Check out another Justin Jaffray <a href="https://justinjaffray.com/what-is-a-query-optimizer-for/">article on the
subject</a> for
more detail.</p>
<h3 id="development-environment">Development environment</h3><p>Before we get to walking the query plan, let's set up the
infrastructure to intercept query execution where we can eventually
add in our print debugging of the query plan reconstructed as a SQL
string.</p>
<p>Once you've got <a href="https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code">Postgres build
dependencies</a>,
build and install a debug version of Postgres:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/postgres/postgres<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>postgres
$<span class="w"> </span><span class="c1"># Make sure you're on the same commit I'm on, just to be safe.</span>
$<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>b218fbb7a35fcf31539bfad12732038fe082a2eb
$<span class="w"> </span>./configure<span class="w"> </span>--enable-cassert<span class="w"> </span>--enable-debug<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">"-ggdb -Og -g3 -fno-omit-frame-pointer"</span>
$<span class="w"> </span>make<span class="w"> </span>-j8
$<span class="w"> </span><span class="c1"># Installs to to /usr/local/pgsql/bin.</span>
$<span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
</pre></div>
<p>I'm not going to cover Postgres extension infrastructure in detail. I
wrote a bit about it in <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">my last
post</a>.
You need only read the first half, if at all; not the actual Table
Access Method implementation.</p>
<p>It will be even simpler in this post because Postgres hooks are
extensions but not extensions you install with <code>CREATE EXTENSION</code>. If
you want to read about the different kinds of Postgres extensions,
check out <a href="https://tembo.io/blog/four-types-of-extensions/">this
article</a> by Steven
Miller.</p>
<p>The minimum we need, aside from the hook code itself, is a Makefile
that uses
<a href="https://www.postgresql.org/docs/current/extend-pgxs.html">PGXS</a>:</p>
<div class="highlight"><pre><span></span><span class="nv">MODULES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgexec
<span class="nv">PG_CONFIG</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>/usr/local/pgsql/bin/pg_config
<span class="nv">PGXS</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">$(</span>shell<span class="w"> </span><span class="k">$(</span>PG_CONFIG<span class="k">)</span><span class="w"> </span>--pgxs<span class="k">)</span>
<span class="cp">include $(PGXS)</span>
</pre></div>
<p>The <code>MODULES</code> value there corresponds to the C file we'll create
shortly, <code>pgexec.c</code>.</p>
<p class="note">
This <code>pg_config</code> binary path is important because you
might have different versions of Postgres installed, for example by
your package manager. It is important that the extension is built
against the same version of Postgres which will load the extension.
</p><p>Now we're ready for some hook code.</p>
<h3 id="intercepting-query-execution">Intercepting query execution</h3><p>You can find the basic structure of a hook (and which hooks are
available) in Tamika Nomara's <a href="https://github.com/taminomara/psql-hooks">unofficial Postgres hooks
docs</a>.</p>
<p class="note">
There is no official central place describing all hooks I could find
in Postgres docs. Some hooks are described in various places
throughout the docs though.
</p><p>Based on that page, we can write a bare minimum hook that will
intercept queries, log when we've done so, and pass control back
to the standard execution path for the actual query. In <code>pgexec.c</code>:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"postgres.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"fmgr.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"executor/executor.h"</span>
<span class="n">PG_MODULE_MAGIC</span><span class="p">;</span>
<span class="k">static</span><span class="w"> </span><span class="n">ExecutorRun_hook_type</span><span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] HOOKED SUCCESSFULLY!"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">pgexec_run_hook</span><span class="p">(</span>
<span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span>
<span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span>
<span class="w"> </span><span class="n">uint64</span><span class="w"> </span><span class="n">count</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">execute_once</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">print_plan</span><span class="p">(</span><span class="n">queryDesc</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="p">(</span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">execute_once</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">_PG_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ExecutorRun_hook</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">standard_ExecutorRun</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">ExecutorRun_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgexec_run_hook</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">_PG_fini</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ExecutorRun_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>You can discover the <code>standard_ExectutorRun</code> function from a quick
<code>git grep ExecutorRun_hook</code> in the Postgres source which leads to
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/executor/execMain.c#L306">src/backend/executor/execMain.c#L306</a>:</p>
<div class="highlight"><pre><span></span><span class="kt">void</span>
<span class="nf">ExecutorRun</span><span class="p">(</span><span class="n">QueryDesc</span><span class="w"> </span><span class="o">*</span><span class="n">queryDesc</span><span class="p">,</span>
<span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">uint64</span><span class="w"> </span><span class="n">count</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">execute_once</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ExecutorRun_hook</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">ExecutorRun_hook</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">execute_once</span><span class="p">);</span>
<span class="w"> </span><span class="k">else</span>
<span class="w"> </span><span class="n">standard_ExecutorRun</span><span class="p">(</span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">execute_once</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>So our hook will just log and pass back execution to the existing
execution hook. Let's build and install the extension.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make
<span class="gp">$ </span>sudo<span class="w"> </span>make<span class="w"> </span>install
</pre></div>
<p>Now we'll create a new database and tell it to load the extension.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/initdb<span class="w"> </span>test-db
<span class="gp">$ </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"shared_preload_libraries = 'pgexec'"</span><span class="w"> </span>><span class="w"> </span>test-db/postgresql.conf
</pre></div>
<p class="note">
Remember, hooks are not <code>CREATE EXTENSION</code> extensions. As
far as I can tell they can't be dynamically loaded (without some
additional dynamic loading infrastructure one could potentially
write). So every time you make a change you need to rebuild the
extension, reinstall it, and restart the Postgres server.
</p><p>And start the server in the foreground:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/postgres<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--config-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db/postgresql.conf<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-D<span class="w"> </span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db
<span class="go"> -k $(pwd)/test-db</span>
<span class="go">2023-11-18 19:35:16.680 GMT [3215547] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), 64-bit</span>
<span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv6 address "::1", port 5432</span>
<span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv4 address "127.0.0.1", port 5432</span>
<span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"</span>
<span class="go">2023-11-18 19:35:16.682 GMT [3215550] LOG: database system was shut down at 2023-11-18 19:20:16 GMT</span>
<span class="go">2023-11-18 19:35:16.684 GMT [3215547] LOG: database system is ready to accept connections</span>
</pre></div>
<p>Keep an eye on this foreground process since this is where <code>elog(LOG,
...)</code> calls will show up.</p>
<p>Now in a new window, create a <code>test.sql</code> script that we can use to
exercise the hook:</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">x</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">309</span><span class="p">);</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
</pre></div>
<p>Run <code>psql</code> so we can trigger the hook:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">DROP TABLE</span>
<span class="go">CREATE TABLE</span>
<span class="go">INSERT 0 1</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 309</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
</pre></div>
<p>And in the <code>postgres</code> foreground process you should see a log:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 17:42:03.045 GMT [3242321] LOG: [pgexec] HOOKED SUCCESSFULLY!</span>
<span class="go">2023-11-19 17:42:03.045 GMT [3242321] STATEMENT: INSERT INTO x VALUES (309);</span>
<span class="go">2023-11-19 17:42:03.045 GMT [3242321] LOG: [pgexec] HOOKED SUCCESSFULLY!</span>
<span class="go">2023-11-19 17:42:03.045 GMT [3242321] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>That's our hook! Interestingly only the <code>INSERT</code> and <code>SELECT</code>
statements show up, not the <code>DROP</code> and <code>CREATE</code>.</p>
<p>Now let's see if we can reconstruct the query text from that first
argument, the <code>QueryDesc*</code> that <code>pgexec_run_hook</code> receives. And let's
simplify things for ourselves and only worry about reconstructing a
<code>SELECT</code> query.</p>
<h3 id="<code>node</code>s-and-<code>datum</code>s"><code>Node</code>s and <code>Datum</code>s</h3><p>But first, let's talk about two fundemental ways data in Postgres
(code) is organized.</p>
<p>Postgres code is extremely dynamic and, maybe relatedly,
fairly object-oriented. Almost every entity in Postgres is a
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/nodes.h#L128"><code>Node</code></a>. While
values in Postgres that are exposed to users of Postgres are
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/postgres.h#L64"><code>Datum</code></a>s.</p>
<p>Each node has a type,
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/nodes.h#L26"><code>NodeTag</code></a>,
that we can switch on to decide what to do. In contrast, <code>Datum</code> has
no type. The type of the <code>Datum</code> must be known by context before using
one of the transform functions like
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/postgres.h#L90"><code>DatumGetBool</code></a>
to retrieve a C value from a <code>Datum</code>.</p>
<p>A table is a <code>Node</code>. A query plan is a <code>Node</code>. A sequential scan is a
<code>Node</code>. A join is a <code>Node</code>. A literal in a query is a <code>Node</code>. The
value for the literal in a query is a <code>Datum</code>.</p>
<p>Here is how The Internals of PostgreSQL book
<a href="https://www.interdb.jp/pg/pgsql03.html">visualizes</a> a query plan for
example:</p>
<p><img src="https://www.interdb.jp/pg/img/fig-3-04.png" alt="https://www.interdb.jp/pg/img/fig-3-04.png"></p>
<p>Every box in that image is a <code>Node</code>.</p>
<p>And all <code>Node</code>s in code I've seen share a common definition prefix
like this:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">SomeThing</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">abstract</span><span class="p">)</span><span class="w"> </span><span class="c1">// If the node is indeed abstract in the OOP sense.</span>
<span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Many <code>Node</code>s you'll see are abstract, like <code>Plan</code>. But by printing out
<code>NodeTag</code> and checking the value printed in
<code>src/include/nodes/nodetags.h</code>, you can find the concrete type of the
<code>Node</code>.</p>
<p><code>src/include/nodes/nodetags.h</code> is generated during a preprocessing
step. (Don't look if <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/nodes/gen_node_support.pl">regex in
Perl</a>
worries you).</p>
<p>We'll get back to <code>Node</code>s later.</p>
<h3 id="what's-in-a-<code>querydesc</code>?">What's in a <code>QueryDesc</code>?</h3><p>Let's take a look at the
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/executor/execdesc.h#L33"><code>QueryDesc</code></a>
struct:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">QueryDesc</span>
<span class="p">{</span>
<span class="w"> </span><span class="cm">/* These fields are provided by CreateQueryDesc */</span>
<span class="w"> </span><span class="n">CmdType</span><span class="w"> </span><span class="n">operation</span><span class="p">;</span><span class="w"> </span><span class="cm">/* CMD_SELECT, CMD_UPDATE, etc. */</span>
<span class="w"> </span><span class="n">PlannedStmt</span><span class="w"> </span><span class="o">*</span><span class="n">plannedstmt</span><span class="p">;</span><span class="w"> </span><span class="cm">/* planner's output (could be utility, too) */</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">sourceText</span><span class="p">;</span><span class="w"> </span><span class="cm">/* source text of the query */</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">;</span><span class="w"> </span><span class="cm">/* snapshot to use for query */</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">crosscheck_snapshot</span><span class="p">;</span><span class="w"> </span><span class="cm">/* crosscheck for RI update/delete */</span>
<span class="w"> </span><span class="n">DestReceiver</span><span class="w"> </span><span class="o">*</span><span class="n">dest</span><span class="p">;</span><span class="w"> </span><span class="cm">/* the destination for tuple output */</span>
<span class="w"> </span><span class="n">ParamListInfo</span><span class="w"> </span><span class="n">params</span><span class="p">;</span><span class="w"> </span><span class="cm">/* param values being passed in */</span>
<span class="w"> </span><span class="n">QueryEnvironment</span><span class="w"> </span><span class="o">*</span><span class="n">queryEnv</span><span class="p">;</span><span class="w"> </span><span class="cm">/* query environment passed in */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">instrument_options</span><span class="p">;</span><span class="w"> </span><span class="cm">/* OR of InstrumentOption flags */</span>
<span class="w"> </span><span class="cm">/* These fields are set by ExecutorStart */</span>
<span class="w"> </span><span class="n">TupleDesc</span><span class="w"> </span><span class="n">tupDesc</span><span class="p">;</span><span class="w"> </span><span class="cm">/* descriptor for result tuples */</span>
<span class="w"> </span><span class="n">EState</span><span class="w"> </span><span class="o">*</span><span class="n">estate</span><span class="p">;</span><span class="w"> </span><span class="cm">/* executor's query-wide state */</span>
<span class="w"> </span><span class="n">PlanState</span><span class="w"> </span><span class="o">*</span><span class="n">planstate</span><span class="p">;</span><span class="w"> </span><span class="cm">/* tree of per-plan-node state */</span>
<span class="w"> </span><span class="cm">/* This field is set by ExecutorRun */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">already_executed</span><span class="p">;</span><span class="w"> </span><span class="cm">/* true if previously executed */</span>
<span class="w"> </span><span class="cm">/* This is always set NULL by the core system, but plugins can change it */</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Instrumentation</span><span class="w"> </span><span class="o">*</span><span class="n">totaltime</span><span class="p">;</span><span class="w"> </span><span class="cm">/* total time spent in ExecutorRun */</span>
<span class="p">}</span><span class="w"> </span><span class="n">QueryDesc</span><span class="p">;</span>
</pre></div>
<p>The
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L46"><code>PlannedStmt</code></a>
field looks interesting. Let's take a look:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">PlannedStmt</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">no_equal</span><span class="p">,</span><span class="w"> </span><span class="n">no_query_jumble</span><span class="p">)</span>
<span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
<span class="w"> </span><span class="n">CmdType</span><span class="w"> </span><span class="n">commandType</span><span class="p">;</span><span class="w"> </span><span class="cm">/* select|insert|update|delete|merge|utility */</span>
<span class="w"> </span><span class="n">uint64</span><span class="w"> </span><span class="n">queryId</span><span class="p">;</span><span class="w"> </span><span class="cm">/* query identifier (copied from Query) */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">hasReturning</span><span class="p">;</span><span class="w"> </span><span class="cm">/* is it insert|update|delete RETURNING? */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">hasModifyingCTE</span><span class="p">;</span><span class="w"> </span><span class="cm">/* has insert|update|delete in WITH? */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">canSetTag</span><span class="p">;</span><span class="w"> </span><span class="cm">/* do I set the command result tag? */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">transientPlan</span><span class="p">;</span><span class="w"> </span><span class="cm">/* redo plan when TransactionXmin changes? */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">dependsOnRole</span><span class="p">;</span><span class="w"> </span><span class="cm">/* is plan specific to current role? */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">parallelModeNeeded</span><span class="p">;</span><span class="w"> </span><span class="cm">/* parallel mode required to execute? */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">jitFlags</span><span class="p">;</span><span class="w"> </span><span class="cm">/* which forms of JIT should be performed */</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span><span class="w"> </span><span class="o">*</span><span class="n">planTree</span><span class="p">;</span><span class="w"> </span><span class="cm">/* tree of Plan nodes */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">rtable</span><span class="p">;</span><span class="w"> </span><span class="cm">/* list of RangeTblEntry nodes */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">permInfos</span><span class="p">;</span><span class="w"> </span><span class="cm">/* list of RTEPermissionInfo nodes for rtable</span>
<span class="cm"> * entries needing one */</span>
<span class="w"> </span><span class="cm">/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">resultRelations</span><span class="p">;</span><span class="w"> </span><span class="cm">/* integer list of RT indexes, or NIL */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">appendRelations</span><span class="p">;</span><span class="w"> </span><span class="cm">/* list of AppendRelInfo nodes */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">subplans</span><span class="p">;</span><span class="w"> </span><span class="cm">/* Plan trees for SubPlan expressions; note</span>
<span class="cm"> * that some could be NULL */</span>
<span class="w"> </span><span class="n">Bitmapset</span><span class="w"> </span><span class="o">*</span><span class="n">rewindPlanIDs</span><span class="p">;</span><span class="w"> </span><span class="cm">/* indices of subplans that require REWIND */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">rowMarks</span><span class="p">;</span><span class="w"> </span><span class="cm">/* a list of PlanRowMark's */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">relationOids</span><span class="p">;</span><span class="w"> </span><span class="cm">/* OIDs of relations the plan depends on */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">invalItems</span><span class="p">;</span><span class="w"> </span><span class="cm">/* other dependencies, as PlanInvalItems */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">paramExecTypes</span><span class="p">;</span><span class="w"> </span><span class="cm">/* type OIDs for PARAM_EXEC Params */</span>
<span class="w"> </span><span class="n">Node</span><span class="w"> </span><span class="o">*</span><span class="n">utilityStmt</span><span class="p">;</span><span class="w"> </span><span class="cm">/* non-null if this is utility stmt */</span>
<span class="w"> </span><span class="cm">/* statement location in source string (copied from Query) */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">stmt_location</span><span class="p">;</span><span class="w"> </span><span class="cm">/* start location, or -1 if unknown */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">stmt_len</span><span class="p">;</span><span class="w"> </span><span class="cm">/* length in bytes; 0 means "rest of string" */</span>
<span class="p">}</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="p">;</span>
</pre></div>
<p>The <code>struct Plan* planTree</code> field in there looks like what we'd want. But
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L119"><code>Plan</code></a>
is abstract:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">abstract</span><span class="p">,</span><span class="w"> </span><span class="n">no_equal</span><span class="p">,</span><span class="w"> </span><span class="n">no_query_jumble</span><span class="p">)</span>
<span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
</pre></div>
<p>So let's try printing out the <code>planTree->type</code> field and find the
<code>Node</code> it is concretely. In <code>pgexec.c</code> change the definition of
<code>print_plan</code>:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] HOOKED SUCCESSFULLY! %d"</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">planTree</span><span class="o">-></span><span class="n">type</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make
<span class="gp">$ </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/postgres<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>--config-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db/postgresql.conf<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-D<span class="w"> </span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db
<span class="go"> -k $(pwd)/test-db</span>
<span class="go">2023-11-18 19:35:16.680 GMT [3215547] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), 64-bit</span>
<span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv6 address "::1", port 5432</span>
<span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv4 address "127.0.0.1", port 5432</span>
<span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"</span>
<span class="go">2023-11-18 19:35:16.682 GMT [3215550] LOG: database system was shut down at 2023-11-18 19:20:16 GMT</span>
<span class="go">2023-11-18 19:35:16.684 GMT [3215547] LOG: database system is ready to accept connections</span>
</pre></div>
<p>And in another window run <code>psql</code>:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
</pre></div>
<p>And check the logs from the <code>postgres</code> process we just started and you
should notice:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 17:46:18.834 GMT [3242495] LOG: [pgexec] HOOKED SUCCESSFULLY! 322</span>
<span class="go">2023-11-19 17:46:18.834 GMT [3242495] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>So <code>322</code> is the <code>NodeTag</code> for the <code>Plan</code>. If we look that up in
Postgres's <code>src/include/nodes/nodetags.h</code> (remember, this is generated
after <code>./configure && make</code> so I can't link you to it):</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">' = 322'</span><span class="w"> </span>src/include/nodes/nodetags.h
<span class="go"> T_SeqScan = 322,</span>
</pre></div>
<p>Hey, that makes sense! A <code>SELECT</code> without any indexes definitely
sounds like a sequential scan!</p>
<h3 id="walking-a-sequential-scan">Walking a sequential scan</h3><p>Let's take a look at the
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L394"><code>SeqScan</code></a>
struct:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">SeqScan</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">scan</span><span class="p">;</span>
<span class="p">}</span><span class="w"> </span><span class="n">SeqScan</span><span class="p">;</span>
</pre></div>
<p>Ok, that's not very interesting. Let's look at
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L382"><code>Scan</code></a>
then:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Scan</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">abstract</span><span class="p">)</span>
<span class="w"> </span><span class="n">Plan</span><span class="w"> </span><span class="n">plan</span><span class="p">;</span>
<span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">scanrelid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* relid is index into the range table */</span>
<span class="p">}</span><span class="w"> </span><span class="n">Scan</span><span class="p">;</span>
</pre></div>
<p>That's interesting! <code>scanrelid</code> represents the table we're scanning. I
don't know what "range table" means exactly. But there was a field on
the <code>PlannedStmt</code> called <code>rtable</code> that seems relevant.</p>
<p><code>rtable</code> was described as a
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/pg_list.h#L53"><code>List</code></a>
of
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/parsenodes.h#L1019"><code>RangeTblEntry</code></a>
nodes. And browsing around the file where <code>List</code> is defined we can see
some nice methods for working with <code>List</code>s, like <code>list_length()</code>.</p>
<p>Let's print out the <code>scanrelid</code> and let's check out the length of the
<code>rtable</code> and see if it's filled out. Let's also restrict our
<code>print_plan</code> code to only look at <code>SeqScan</code> nodes. In <code>pgexec.c</code>:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">SeqScan</span><span class="o">*</span><span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">planTree</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">T_SeqScan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] Unsupported plan type."</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">SeqScan</span><span class="o">*</span><span class="p">)</span><span class="n">plan</span><span class="p">;</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] relid: %d, rtable length: %d"</span><span class="p">,</span><span class="w"> </span><span class="n">scan</span><span class="o">-></span><span class="n">scan</span><span class="p">.</span><span class="n">scanrelid</span><span class="p">,</span><span class="w"> </span><span class="n">list_length</span><span class="p">(</span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">rtable</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres. (You can
find the instructions for this above if you've forgotten.) Re-run the
<code>test.sql</code> script. And check the Postgres server logs. You should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 18:00:34.184 GMT [3244438] LOG: [pgexec] relid: 1, rtable length: 1</span>
<span class="go">2023-11-19 18:00:34.184 GMT [3244438] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>Awesome! So <code>rtable</code> does have data in it. There's only one table in
this query so its length makes sense to be <code>1</code>. The <code>scanrelid</code> being
<code>1</code> also though is weird. Let's fetch the nth value from the <code>rtable</code>
list using <code>scanrelid-1</code> as the index.</p>
<p>For the
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/parsenodes.h#L1019"><code>RangeTblEntry</code></a>
itself, let's take a look:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="n">RTEKind</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">,</span><span class="w"> </span><span class="cm">/* ordinary relation reference */</span>
<span class="w"> </span><span class="n">RTE_SUBQUERY</span><span class="p">,</span><span class="w"> </span><span class="cm">/* subquery in FROM */</span>
<span class="w"> </span><span class="n">RTE_JOIN</span><span class="p">,</span><span class="w"> </span><span class="cm">/* join */</span>
<span class="w"> </span><span class="n">RTE_FUNCTION</span><span class="p">,</span><span class="w"> </span><span class="cm">/* function in FROM */</span>
<span class="w"> </span><span class="n">RTE_TABLEFUNC</span><span class="p">,</span><span class="w"> </span><span class="cm">/* TableFunc(.., column list) */</span>
<span class="w"> </span><span class="n">RTE_VALUES</span><span class="p">,</span><span class="w"> </span><span class="cm">/* VALUES (<exprlist>), (<exprlist>), ... */</span>
<span class="w"> </span><span class="n">RTE_CTE</span><span class="p">,</span><span class="w"> </span><span class="cm">/* common table expr (WITH list element) */</span>
<span class="w"> </span><span class="n">RTE_NAMEDTUPLESTORE</span><span class="p">,</span><span class="w"> </span><span class="cm">/* tuplestore, e.g. for AFTER triggers */</span>
<span class="w"> </span><span class="n">RTE_RESULT</span><span class="p">,</span><span class="w"> </span><span class="cm">/* RTE represents an empty FROM clause; such</span>
<span class="cm"> * RTEs are added by the planner, they're not</span>
<span class="cm"> * present during parsing or rewriting */</span>
<span class="p">}</span><span class="w"> </span><span class="n">RTEKind</span><span class="p">;</span>
<span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">RangeTblEntry</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">custom_read_write</span><span class="p">,</span><span class="w"> </span><span class="n">custom_query_jumble</span><span class="p">)</span>
<span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
<span class="w"> </span><span class="n">RTEKind</span><span class="w"> </span><span class="n">rtekind</span><span class="p">;</span><span class="w"> </span><span class="cm">/* see above */</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * XXX the fields applicable to only some rte kinds should be merged into</span>
<span class="cm"> * a union. I didn't do this yet because the diffs would impact a lot of</span>
<span class="cm"> * code that is being actively worked on. FIXME someday.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * Fields valid for a plain relation RTE (else zero):</span>
<span class="cm"> *</span>
<span class="cm"> * rellockmode is really LOCKMODE, but it's declared int to avoid having</span>
<span class="cm"> * to include lock-related headers here. It must be RowExclusiveLock if</span>
<span class="cm"> * the RTE is an INSERT/UPDATE/DELETE/MERGE target, else RowShareLock if</span>
<span class="cm"> * the RTE is a SELECT FOR UPDATE/FOR SHARE target, else AccessShareLock.</span>
<span class="cm"> *</span>
<span class="cm"> * Note: in some cases, rule expansion may result in RTEs that are marked</span>
<span class="cm"> * with RowExclusiveLock even though they are not the target of the</span>
<span class="cm"> * current query; this happens if a DO ALSO rule simply scans the original</span>
<span class="cm"> * target table. We leave such RTEs with their original lockmode so as to</span>
<span class="cm"> * avoid getting an additional, lesser lock.</span>
<span class="cm"> *</span>
<span class="cm"> * perminfoindex is 1-based index of the RTEPermissionInfo belonging to</span>
<span class="cm"> * this RTE in the containing struct's list of same; 0 if permissions need</span>
<span class="cm"> * not be checked for this RTE.</span>
<span class="cm"> *</span>
<span class="cm"> * As a special case, relid, relkind, rellockmode, and perminfoindex can</span>
<span class="cm"> * also be set (nonzero) in an RTE_SUBQUERY RTE. This occurs when we</span>
<span class="cm"> * convert an RTE_RELATION RTE naming a view into an RTE_SUBQUERY</span>
<span class="cm"> * containing the view's query. We still need to perform run-time locking</span>
<span class="cm"> * and permission checks on the view, even though it's not directly used</span>
<span class="cm"> * in the query anymore, and the most expedient way to do that is to</span>
<span class="cm"> * retain these fields from the old state of the RTE.</span>
<span class="cm"> *</span>
<span class="cm"> * As a special case, RTE_NAMEDTUPLESTORE can also set relid to indicate</span>
<span class="cm"> * that the tuple format of the tuplestore is the same as the referenced</span>
<span class="cm"> * relation. This allows plans referencing AFTER trigger transition</span>
<span class="cm"> * tables to be invalidated if the underlying table is altered.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">relid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* OID of the relation */</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">relkind</span><span class="p">;</span><span class="w"> </span><span class="cm">/* relation kind (see pg_class.relkind) */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">rellockmode</span><span class="p">;</span><span class="w"> </span><span class="cm">/* lock level that query requires on the rel */</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">TableSampleClause</span><span class="w"> </span><span class="o">*</span><span class="n">tablesample</span><span class="p">;</span><span class="w"> </span><span class="cm">/* sampling info, or NULL */</span>
<span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">perminfoindex</span><span class="p">;</span>
</pre></div>
<p>In <code>SELECT a FROM x</code>, <code>x</code> should be a plain relation RTE (to use the
terminology there). So we can add a guard that validates that. But we
don't get a <code>Relation</code>. (You might remember from my <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">previous
post</a>
that <code>Relation</code> is where we can finally see the table name.)</p>
<p>We get an <code>Oid</code> for the <code>Relation</code>. So we need to find a way to lookup
a <code>Relation</code> from an <code>Oid</code>. And by grepping around in Postgres (or via
judicious use of ChatGPT, I confess), we can notice
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/utils/cache/relcache.c#L2056"><code>RelationIdGetRelation</code></a>
that takes an <code>Oid</code> and returns a <code>Relation</code>. Notice also that the
comment says we should close the relation when we're done with
<code>RelationClose</code>.</p>
<p>So putting it altogether (and again, reusing some code from that
previous post), we can print out the table name.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">SeqScan</span><span class="o">*</span><span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">planTree</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">T_SeqScan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] Unsupported plan type."</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">SeqScan</span><span class="o">*</span><span class="p">)</span><span class="n">plan</span><span class="p">;</span>
<span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">scan</span><span class="o">-></span><span class="n">scan</span><span class="p">.</span><span class="n">scanrelid</span><span class="mi">-1</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] Unsupported FROM type: %d."</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RelationIdGetRelation</span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">relid</span><span class="p">);</span>
<span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">relation</span><span class="o">-></span><span class="n">rd_rel</span><span class="o">-></span><span class="n">relname</span><span class="p">);</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] SELECT [todo] FROM %s"</span><span class="p">,</span><span class="w"> </span><span class="n">tablename</span><span class="p">);</span>
<span class="w"> </span><span class="n">RelationClose</span><span class="p">(</span><span class="n">relation</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>You'll also need to add a new <code>#include</code> for
<code>utils/rel.h</code>.</p>
<p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and you should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 18:36:03.986 GMT [3246777] LOG: [pgexec] SELECT [todo] FROM x</span>
<span class="go">2023-11-19 18:36:03.986 GMT [3246777] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>Fantastic! Before we get into walking the <code>SELECT</code> columns and the
(optional) <code>WHERE</code> clause, let's do some quick refactoring.</p>
<h3 id="a-string-builder">A string builder</h3><p>Let's add a little string builder library so we can emit a single
string we build up to a single <code>elog()</code> call.</p>
<p>I wrote this ahead of time and won't explain it here since the details
aren't relevant.</p>
<p>Just copy this and paste near the top of <code>pgexec.c</code>:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">len</span><span class="p">;</span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">offset</span><span class="p">;</span>
<span class="p">}</span><span class="w"> </span><span class="n">PGExec_Buffer</span><span class="p">;</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_init</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">8</span><span class="p">;</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">len</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_resize_to_fit_additional</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">additional</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">newsize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">additional</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">additional</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">newsize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">additional</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">newsize</span><span class="p">);</span>
<span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">new</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span>
<span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">new</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">len</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">));</span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="p">);</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newsize</span><span class="p">;</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_append</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="p">);</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_appendz</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_append</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">c</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_append</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">chars</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_resize_to_fit_additional</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">chars</span><span class="p">);</span>
<span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">chars</span><span class="p">);</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">chars</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_appendf</span><span class="p">(</span>
<span class="w"> </span><span class="n">PGExec_Buffer</span><span class="w"> </span><span class="o">*</span><span class="p">,</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="kr">restrict</span><span class="p">,</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">)</span><span class="w"> </span><span class="n">__attribute__</span><span class="w"> </span><span class="p">((</span><span class="n">format</span><span class="w"> </span><span class="p">(</span><span class="n">gnu_printf</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">)));</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_appendf</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="kr">restrict</span><span class="w"> </span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// First figure out how long the result will be.</span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kt">va_list</span><span class="w"> </span><span class="n">arglist</span><span class="p">;</span>
<span class="w"> </span><span class="n">va_start</span><span class="p">(</span><span class="n">arglist</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">);</span>
<span class="w"> </span><span class="n">chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vsnprintf</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="n">arglist</span><span class="p">);</span>
<span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">chars</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="c1">// TODO: error handling.</span>
<span class="w"> </span><span class="c1">// Resize to fit result.</span>
<span class="w"> </span><span class="n">buffer_resize_to_fit_additional</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">chars</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Actually do the printf into buf.</span>
<span class="w"> </span><span class="n">va_end</span><span class="p">(</span><span class="n">arglist</span><span class="p">);</span>
<span class="w"> </span><span class="n">va_start</span><span class="p">(</span><span class="n">arglist</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">);</span>
<span class="w"> </span><span class="n">chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vsprintf</span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="n">arglist</span><span class="p">);</span>
<span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">chars</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="c1">// TODO: error handling.</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">chars</span><span class="p">;</span>
<span class="w"> </span><span class="n">va_end</span><span class="p">(</span><span class="n">arglist</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="nf">buffer_cstring</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">zero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">prev_offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_append</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">zero</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="o">--</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="p">[</span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Offset should stay the same. This is a fake NULL.</span>
<span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">offset</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">prev_offset</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_free</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="o">-></span><span class="n">mem</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Next we'll modify <code>print_plan()</code> in <code>pgexec.c</code> to use it, and add stubs
for printing the <code>SELECT</code> columns and <code>WHERE</code> clauses.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" [where todo]"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_select_columns</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[columns todo]"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">SeqScan</span><span class="o">*</span><span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">planTree</span><span class="p">;</span>
<span class="w"> </span><span class="n">PGExec_Buffer</span><span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">T_SeqScan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] Unsupported plan type."</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">SeqScan</span><span class="o">*</span><span class="p">)</span><span class="n">plan</span><span class="p">;</span>
<span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="o">-></span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">scan</span><span class="o">-></span><span class="n">scan</span><span class="p">.</span><span class="n">scanrelid</span><span class="mi">-1</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] Unsupported FROM type: %d."</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">buffer_init</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span>
<span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RelationIdGetRelation</span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">relid</span><span class="p">);</span>
<span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">relation</span><span class="o">-></span><span class="n">rd_rel</span><span class="o">-></span><span class="n">relname</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"SELECT "</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_print_select_columns</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" FROM %s"</span><span class="p">,</span><span class="w"> </span><span class="n">tablename</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_print_where</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="p">);</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[pgexec] %s"</span><span class="p">,</span><span class="w"> </span><span class="n">buffer_cstring</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">));</span>
<span class="w"> </span><span class="n">RelationClose</span><span class="p">(</span><span class="n">relation</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_free</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Now we just need to implement the <code>buffer_print_where</code> and
<code>buffer_print_select_columns</code> functions and our walking infrastructure
will be done! For now. :)</p>
<h3 id="walking-the-<code>where</code>-clause">Walking the <code>WHERE</code> clause</h3><p>If you remember back to the <code>SeqScan</code> and <code>Scan</code> nodes, they were both
basically empty. They had a <code>Plan</code> and a <code>scanrelid</code>. So the rest of
the <code>SELECT</code> info must be in the <code>Plan</code> since it wasn't in the <code>Scan</code>.</p>
<p>Let's look at
<a href="https://github.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h#L119"><code>Plan</code></a>
again. One part that stands out is:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * Common structural data for all Plan types.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">plan_node_id</span><span class="p">;</span><span class="w"> </span><span class="cm">/* unique across entire final plan tree */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">targetlist</span><span class="p">;</span><span class="w"> </span><span class="cm">/* target list to be computed at this node */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">qual</span><span class="p">;</span><span class="w"> </span><span class="cm">/* implicitly-ANDed qual conditions */</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span><span class="w"> </span><span class="o">*</span><span class="n">lefttree</span><span class="p">;</span><span class="w"> </span><span class="cm">/* input plan tree(s) */</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span><span class="w"> </span><span class="o">*</span><span class="n">righttree</span><span class="p">;</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">initPlan</span><span class="p">;</span><span class="w"> </span><span class="cm">/* Init Plan nodes (un-correlated expr</span>
<span class="cm"> * subselects) */</span>
</pre></div>
<p><code>qual</code> kinda looks like a <code>WHERE</code> clause. (And <code>targetlist</code> kinda
looks like the columns the <code>SELECT</code> pulls).</p>
<p><a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/pg_list.h#L53"><code>List</code></a>s
just contain void pointers, so we can't tell what the type of <code>qual</code>
or <code>targetlist</code> children are. But I'm going to make a wild guess they
are <code>Node</code>s.</p>
<p>There's even a nice helper that casts void pointers to <code>Node*</code> and
pulls out the type,
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/nodes.h#L133"><code>nodeTag()</code></a>.</p>
<p>And reading around <code>pg_list.h</code> shows some interesting helper utilities
like
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/pg_list.h#L373"><code>foreach</code></a>
that we can use to iterate the list.</p>
<p>Let's try printing out the type of <code>qual</code>'s members.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ListCell</span><span class="o">*</span><span class="w"> </span><span class="n">cell</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">qual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" WHERE "</span><span class="p">);</span>
<span class="w"> </span><span class="n">foreach</span><span class="p">(</span><span class="n">cell</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-></span><span class="n">qual</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" AND "</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[node: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">lfirst</span><span class="p">(</span><span class="n">cell</span><span class="p">)));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p class="note">
Notice any <a
href="https://twitter.com/eatonphil/status/1726265982094819631">vestiges
of LISP</a>?
</p><p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and you should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 19:17:00.879 GMT [3250850] LOG: [pgexec] SELECT [columns todo] FROM x WHERE [node: 15]</span>
<span class="go">2023-11-19 19:17:00.879 GMT [3250850] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>Well, our code didn't crash! So the guess about <code>qual</code> <code>List</code> entries
being <code>Node</code>s seems right. Let's look up that node type in the
Postgres repo:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">' = 15,'</span><span class="w"> </span>src/include/nodes/nodetags.h
<span class="go"> T_OpExpr = 15,</span>
</pre></div>
<p>Woot! That is exactly what I'd expect the <code>WHERE</code> clause here to be.</p>
<p>Now that we know <code>qual</code> is a <code>List</code> of <code>Node</code>s, let's do a bit of
refactoring since <code>targetlist</code> will probably also be a <code>List</code> of
<code>Node</code>s. Back in <code>pgexec.c</code>:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="p">);</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="p">);</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_opexpr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">OpExpr</span><span class="o">*</span><span class="w"> </span><span class="n">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[opexpr: todo]"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="w"> </span><span class="n">list</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">sep</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ListCell</span><span class="o">*</span><span class="w"> </span><span class="n">cell</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span>
<span class="w"> </span><span class="n">foreach</span><span class="p">(</span><span class="n">cell</span><span class="p">,</span><span class="w"> </span><span class="n">list</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Node</span><span class="o">*</span><span class="p">)</span><span class="n">lfirst</span><span class="p">(</span><span class="n">cell</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">qual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" WHERE "</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_print_list</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-></span><span class="n">qual</span><span class="p">,</span><span class="w"> </span><span class="s">" AND "</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And let's check out <code>OpExpr</code>!</p>
<h3 id="walking-<code>opexpr</code>">Walking <code>OpExpr</code></h3><p>Take a look at the definition of
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L748"><code>OpExpr</code></a>:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">OpExpr</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">Expr</span><span class="w"> </span><span class="n">xpr</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* PG_OPERATOR OID of the operator */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opno</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* PG_PROC OID of underlying function */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opfuncid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">equal_ignore_if_zero</span><span class="p">,</span><span class="w"> </span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* PG_TYPE OID of result value */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opresulttype</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* true if operator returns set */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">opretset</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* OID of collation of result */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* OID of collation that operator should use */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">inputcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* arguments to the operator (1 or 2) */</span>
<span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">args</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* token location, or -1 if unknown */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="p">;</span>
<span class="p">}</span><span class="w"> </span><span class="n">OpExpr</span><span class="p">;</span>
</pre></div>
<p>The important fields are <code>opno</code>, the <code>Oid</code> of the operator, and
<code>args</code>. <code>args</code> looks like another <code>List</code> of <code>Node</code>s so we already know
how to handle that.</p>
<p>But how do we find the string name of the operator? Presumably there's
infrastructure like <code>RelationIdGetRelation</code> that takes an <code>Oid</code> and
gets us an operator object.</p>
<p>Well I got stuck here as well. Again, thankfully, ChatGPT gave me some
suggestions. There's no great story for how I got it working. So here's
<code>buffer_print_opexpr</code>.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_op</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">OpExpr</span><span class="o">*</span><span class="w"> </span><span class="n">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">HeapTuple</span><span class="w"> </span><span class="n">opertup</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SearchSysCache1</span><span class="p">(</span><span class="n">OPEROID</span><span class="p">,</span><span class="w"> </span><span class="n">ObjectIdGetDatum</span><span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">opno</span><span class="p">));</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">HeapTupleIsValid</span><span class="p">(</span><span class="n">opertup</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Form_pg_operator</span><span class="w"> </span><span class="n">operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Form_pg_operator</span><span class="p">)</span><span class="n">GETSTRUCT</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" %s "</span><span class="p">,</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">operator</span><span class="o">-></span><span class="n">oprname</span><span class="p">));</span>
<span class="w"> </span><span class="n">ReleaseSysCache</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown operation: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">op</span><span class="o">-></span><span class="n">opno</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// TODO: Support single operand operations.</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)));</span>
<span class="p">}</span>
</pre></div>
<p>And add the following two includes to the top of <code>pgexec.c</code>:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"catalog/pg_operator.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"utils/syscache.h"</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and you should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 19:42:52.916 GMT [3252974] LOG: [pgexec] SELECT [columns todo] FROM x WHERE [Unknown: 6] > [Unknown: 7]</span>
<span class="go">2023-11-19 19:42:52.916 GMT [3252974] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>And we continue to make progress! Let's look up the type of these two
unknown nodes.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">' = 6,'</span><span class="w"> </span>src/include/nodes/nodetags.h
<span class="go"> T_Var = 6,</span>
<span class="gp">$ </span>grep<span class="w"> </span><span class="s1">' = 7,'</span><span class="w"> </span>src/include/nodes/nodetags.h
<span class="go"> T_Const = 7,</span>
</pre></div>
<p>Let's deal with <code>Const</code> first.</p>
<h3 id="walking-<code>const</code>">Walking <code>Const</code></h3><p>If we take a look at the
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L292"><code>Const</code></a>
definition:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Const</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">custom_copy_equal</span><span class="p">,</span><span class="w"> </span><span class="n">custom_read_write</span><span class="p">)</span>
<span class="w"> </span><span class="n">Expr</span><span class="w"> </span><span class="n">xpr</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* pg_type OID of the constant's datatype */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">consttype</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* typmod value, if any */</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">consttypmod</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* OID of collation, or InvalidOid if none */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">constcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* typlen of the constant's datatype */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">constlen</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* the constant's value */</span>
<span class="w"> </span><span class="n">Datum</span><span class="w"> </span><span class="n">constvalue</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* whether the constant is null (if true, constvalue is undefined) */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">constisnull</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * Whether this datatype is passed by value. If true, then all the</span>
<span class="cm"> * information is stored in the Datum. If false, then the Datum contains</span>
<span class="cm"> * a pointer to the information.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">constbyval</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * token location, or -1 if unknown. All constants are tracked as</span>
<span class="cm"> * locations in query jumbling, to be marked as parameters.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_location</span><span class="p">);</span>
<span class="p">}</span><span class="w"> </span><span class="n">Const</span><span class="p">;</span>
</pre></div>
<p>It looks like we need to switch on the <code>consttype</code> (an <code>Oid</code>) to
figure out how to interpret the <code>constvalue</code> (a <code>Datum</code>). Remember I
mentioned earlier that how to interpret a <code>Datum</code> is dependent on
context. <code>consttype</code> is the context here.</p>
<p>In this case, although <code>consttype</code> is an <code>Oid</code> and we had to use
Postgres infrastructure to look up the <code>Oid</code>'s corresponding object,
there are some builtin types and the literals we've queried with are
among them.</p>
<p>We can simply check if <code>consttype == INT4OID</code> and the interpret the
<code>Datum</code> as an <code>int32</code> if so. <code>DatumGetInt32</code> will get us that <code>int32</code>
in that case.</p>
<p>To support the new <code>Const</code> type, we'll add a case in
<code>buffer_print_expr</code> to look for a <code>T_Const</code>.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Const</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_const</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Const</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And add a new function, <code>buffer_print_const</code>:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_const</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Const</span><span class="o">*</span><span class="w"> </span><span class="n">cnst</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">cnst</span><span class="o">-></span><span class="n">consttype</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">INT4OID</span><span class="p">:</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">DatumGetInt32</span><span class="p">(</span><span class="n">cnst</span><span class="o">-></span><span class="n">constvalue</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"%d"</span><span class="p">,</span><span class="w"> </span><span class="n">val</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown consttype oid: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">cnst</span><span class="o">-></span><span class="n">consttype</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and you should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 19:53:47.922 GMT [3253746] LOG: [pgexec] SELECT [columns todo] FROM x WHERE [Unknown: 6] > 1</span>
<span class="go">2023-11-19 19:53:47.922 GMT [3253746] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>Great! Now we just have to tackle <code>T_Var</code>.</p>
<h3 id="walking-<code>var</code>">Walking <code>Var</code></h3><p>Let's take a look at the definition of <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L233"><code>Var</code></a>:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Var</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">Expr</span><span class="w"> </span><span class="n">xpr</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * index of this var's relation in the range table, or</span>
<span class="cm"> * INNER_VAR/OUTER_VAR/etc</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">varno</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * attribute number of this var, or zero for all attrs ("whole-row Var")</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="n">AttrNumber</span><span class="w"> </span><span class="n">varattno</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/* pg_type OID for the type of this var */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">vartype</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* pg_attribute typmod value */</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">vartypmod</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* OID of collation, or InvalidOid if none */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">varcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * RT indexes of outer joins that can replace the Var's value with null.</span>
<span class="cm"> * We can omit varnullingrels in the query jumble, because it's fully</span>
<span class="cm"> * determined by varno/varlevelsup plus the Var's query location.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="n">Bitmapset</span><span class="w"> </span><span class="o">*</span><span class="n">varnullingrels</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * for subquery variables referencing outer relations; 0 in a normal var,</span>
<span class="cm"> * >0 means N levels up</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">varlevelsup</span><span class="p">;</span>
<span class="w"> </span><span class="cm">/*</span>
<span class="cm"> * varnosyn/varattnosyn are ignored for equality, because Vars with</span>
<span class="cm"> * different syntactic identifiers are semantically the same as long as</span>
<span class="cm"> * their varno/varattno match.</span>
<span class="cm"> */</span>
<span class="w"> </span><span class="cm">/* syntactic relation index (0 if unknown) */</span>
<span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">varnosyn</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">equal_ignore</span><span class="p">,</span><span class="w"> </span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* syntactic attribute number */</span>
<span class="w"> </span><span class="n">AttrNumber</span><span class="w"> </span><span class="n">varattnosyn</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">equal_ignore</span><span class="p">,</span><span class="w"> </span><span class="n">query_jumble_ignore</span><span class="p">);</span>
<span class="w"> </span><span class="cm">/* token location, or -1 if unknown */</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="p">;</span>
<span class="p">}</span><span class="w"> </span><span class="n">Var</span><span class="p">;</span>
</pre></div>
<p>It looks like this refers to a relation in the range table list
again. So this means we need to have access to the full <code>PlannedStmt</code>
so we can read its <code>rtable</code> field again to find the table. Then we
need to look up the <code>Relation</code> for the table and then we can use the
<code>Var</code>'s <code>varattno</code> field to pick the nth attribute from the relation
and get its string representation.</p>
<p>However, ChatGPT found a slightly higher-level function:
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/utils/cache/lsyscache.c#L826"><code>get_attname()</code></a>
that takes a relation <code>Oid</code> and an attribute index and returns the
string name of the column.</p>
<p>So here's what <code>buffer_print_var</code> looks like:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_var</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Var</span><span class="o">*</span><span class="w"> </span><span class="n">var</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">stmt</span><span class="o">-></span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-></span><span class="n">varno</span><span class="mi">-1</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unsupported relation type for var: %d]."</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get_attname</span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">relid</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-></span><span class="n">varattno</span><span class="p">,</span><span class="w"> </span><span class="nb">false</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="n">pfree</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>You'll also need to add another <code>#include</code> for <code>utils/lsyscache.h</code>.</p>
<p>Let's add the <code>case T_Var:</code> check in <code>buffer_print_expr</code>, and also
feed the <code>PlannedStmt*</code> through all the necessary <code>buffer_print_X</code>
functions:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="p">);</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="p">);</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_opexpr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">OpExpr</span><span class="o">*</span><span class="w"> </span><span class="n">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">HeapTuple</span><span class="w"> </span><span class="n">opertup</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SearchSysCache1</span><span class="p">(</span><span class="n">OPEROID</span><span class="p">,</span><span class="w"> </span><span class="n">ObjectIdGetDatum</span><span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">opno</span><span class="p">));</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">HeapTupleIsValid</span><span class="p">(</span><span class="n">opertup</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Form_pg_operator</span><span class="w"> </span><span class="n">operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Form_pg_operator</span><span class="p">)</span><span class="n">GETSTRUCT</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" %s "</span><span class="p">,</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">operator</span><span class="o">-></span><span class="n">oprname</span><span class="p">));</span>
<span class="w"> </span><span class="n">ReleaseSysCache</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown operation: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">op</span><span class="o">-></span><span class="n">opno</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// TODO: Support single operand operations.</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)));</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_const</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Const</span><span class="o">*</span><span class="w"> </span><span class="n">cnst</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">cnst</span><span class="o">-></span><span class="n">consttype</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">INT4OID</span><span class="p">:</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">DatumGetInt32</span><span class="p">(</span><span class="n">cnst</span><span class="o">-></span><span class="n">constvalue</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"%d"</span><span class="p">,</span><span class="w"> </span><span class="n">val</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown consttype oid: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">cnst</span><span class="o">-></span><span class="n">consttype</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_var</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Var</span><span class="o">*</span><span class="w"> </span><span class="n">var</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">stmt</span><span class="o">-></span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-></span><span class="n">varno</span><span class="mi">-1</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unsupported relation type for var: %d]."</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-></span><span class="n">rtekind</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get_attname</span><span class="p">(</span><span class="n">rte</span><span class="o">-></span><span class="n">relid</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-></span><span class="n">varattno</span><span class="p">,</span><span class="w"> </span><span class="nb">false</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="n">pfree</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Const</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_const</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Const</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Var</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_var</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Var</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="w"> </span><span class="n">list</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">sep</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ListCell</span><span class="o">*</span><span class="w"> </span><span class="n">cell</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span>
<span class="w"> </span><span class="n">foreach</span><span class="p">(</span><span class="n">cell</span><span class="p">,</span><span class="w"> </span><span class="n">list</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Node</span><span class="o">*</span><span class="p">)</span><span class="n">lfirst</span><span class="p">(</span><span class="n">cell</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">qual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">" WHERE "</span><span class="p">);</span>
<span class="w"> </span><span class="n">buffer_print_list</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-></span><span class="n">qual</span><span class="p">,</span><span class="w"> </span><span class="s">" AND "</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and you should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 20:03:14.351 GMT [3254458] LOG: [pgexec] SELECT [columns todo] FROM x WHERE a > 1</span>
<span class="go">2023-11-19 20:03:14.351 GMT [3254458] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>Huzzah!</p>
<h3 id="walking-the-column-list">Walking the column list</h3><p>Let's get rid of <code>[columns todo]</code>. We already had the idea that <code>List*
targetlist</code> on the <code>Plan</code> struct was a list of expression
<code>Node</code>s. Let's try it.</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_select_columns</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-></span><span class="n">targetlist</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">buffer_print_list</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-></span><span class="n">plannedstmt</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-></span><span class="n">targetlist</span><span class="p">,</span><span class="w"> </span><span class="s">", "</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and you should see:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 20:12:48.091 GMT [3255398] LOG: [pgexec] SELECT [Unknown: 53] FROM x WHERE a > 1</span>
<span class="go">2023-11-19 20:12:48.091 GMT [3255398] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>Hmm. Let's look up <code>Node</code> <code>53</code> in Postgres:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">' = 53,'</span><span class="w"> </span>src/include/nodes/nodetags.h
<span class="go"> T_TargetEntry = 53,</span>
</pre></div>
<p>Based on the definition of
<a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L1918"><code>TargetEntry</code></a>,
it looks like we can ignore most of the fields (because we don't need
to handle <code>SELECT a AS b</code> yet) and just proxy the child <code>expr</code> field.</p>
<p>Let's add a <code>case T_TargetEntry</code> to <code>buffer_print_expr</code>:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Const</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_const</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Const</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Var</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_var</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Var</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_TargetEntry</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Node</span><span class="o">*</span><span class="p">)((</span><span class="n">TargetEntry</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">)</span><span class="o">-></span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span>
<span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">"[Unknown: %d]"</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Rebuild and reinstall the extension, and restart Postgres. Re-run the
<code>test.sql</code> script. Check the Postgres server logs and:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 20:17:51.114 GMT [3257827] LOG: [pgexec] SELECT a FROM x WHERE a > 1</span>
<span class="go">2023-11-19 20:17:51.114 GMT [3257827] STATEMENT: SELECT a FROM x WHERE a > 1;</span>
</pre></div>
<p>We did it!</p>
<h3 id="variations">Variations</h3><p>Let's try out some other queries to make sure this wasn't just luck.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-c<span class="w"> </span><span class="s1">'SELECT a + 1 FROM x'</span>
<span class="go"> ?column?</span>
<span class="go">----------</span>
<span class="go"> 310</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-c<span class="w"> </span><span class="s1">'SELECT a + 1 FROM x WHERE 2 > a'</span>
<span class="go"> ?column?</span>
<span class="go">----------</span>
<span class="gp gp-VirtualEnv">(0 rows)</span>
</pre></div>
<p>And back in the Postgres server logs:</p>
<div class="highlight"><pre><span></span><span class="go">2023-11-19 20:19:28.057 GMT [3257874] LOG: [pgexec] SELECT a + 1 FROM x</span>
<span class="go">2023-11-19 20:19:28.057 GMT [3257874] STATEMENT: SELECT a + 1 FROM x</span>
<span class="go">2023-11-19 20:19:30.474 GMT [3257878] LOG: [pgexec] SELECT a + 1 FROM x WHERE 2 > a</span>
<span class="go">2023-11-19 20:19:30.474 GMT [3257878] STATEMENT: SELECT a + 1 FROM x WHERE 2 > a</span>
</pre></div>
<p>Not bad!</p>
<h3 id="next-steps">Next steps</h3><p>Printing out the statement here isn't incredibly useful. But it
establishes a basis for future work that might avoid Postgres's query
execution engine and do the execution ourselves, or to proxy execution
to another system.</p>
<h3 id="postscript:-on-chatgpt">Postscript: On ChatGPT</h3><p>My recent Postgres explorations would have been basically impossible
if it weren't for being able to ask ChatGPT simple, stupid questions
like "How do I get from a Postgres <code>Var</code> to a column name".</p>
<p>It isn't always right. It doesn't always give great code. Actually, it
normally gives pretty weird code. But it's been extremely useful for
quick iteration when I get stuck.</p>
<p>The only other place the information exists is in small blog posts
around the internet, the Postgres mailing lists (that so far for me
hasn't been super responsive), and the code itself.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I've been on a Postgres roll. Let's dig into interpreting a Postgres query plan in preparation for future work on completely diverting the flow of Postgres query execution using execution hooks!<a href="https://t.co/EZrgoIiTuX">https://t.co/EZrgoIiTuX</a> <a href="https://t.co/7S6d6olPX8">pic.twitter.com/7S6d6olPX8</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1726336428626587710?ref_src=twsrc%5Etfw">November 19, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-11-19-exploring-a-postgres-query-plan.htmlSun, 19 Nov 2023 00:00:00 +0000
- Writing a storage engine for Postgres: an in-memory Table Access Methodhttp://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html<p>With <a href="https://www.postgresql.org/docs/release/12.0/">Postgres 12</a>,
released in 2019, it became possible to <a href="https://www.pgcon.org/2019/schedule/attachments/536_pgcon2019_pluggable_table_AM_V1.3.pdf">swap out Postgres's storage
engine</a>.</p>
<p>This is a feature MySQL has supported for a long time. There are at
least <a href="https://github.com/eatonphil/pgtam">8 different</a> <em>built-in</em>
engines you can pick from. <a href="https://myrocks.io/">MyRocks</a>, MySQL on
RocksDB, is another popular third-party distribution.</p>
<p>I assume there will be a renaissance of Postgres storage engines. To
date, the efforts are
nascent. <a href="https://github.com/orioledb/orioledb">OrioleDB</a> and <a href="https://github.com/citusdata/citus/blob/main/src/backend/columnar/README.md">Citus
Columnar</a>
are two promising third-party table access methods being actively
developed.</p>
<h3 id="why-alternative-storage-engines?">Why alternative storage engines?</h3><p>The ability to swap storage engines is useful because different
workloads sometimes benefit from different storage
approaches. Analytics workloads and columnar storage layouts <a href="https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html">go well
together</a>. Write-heavy
workloads and LSM trees <a href="https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM">go well
together</a>. And
some people like in-memory storage for running integration tests.</p>
<p>By swapping out only the storage engine, you get the benefit of the
rest of the Postgres or MySQL infrastructure. The query language, the
wire protocol, the ecosystem, etc.</p>
<h3 id="why-not-foreign-data-wrappers?">Why not foreign data wrappers?</h3><p>Very little has been written about the difference between foreign data
wrappers (FDWs) and table access methods. Table access methods seems
to be the lower-level layer where presumably you get better
performance and cleaner integration. But there is clearly overlap
between these two extension options.</p>
<p>For example there is a <a href="https://github.com/ildus/clickhouse_fdw">FDW for
ClickHouse</a> so when you
create tables and rows and query the tables you are really creating
and querying rows in a ClickHouse server. Similarly there's a <a href="https://github.com/vidardb/pgrocks-fdw">FDW for
RocksDB</a>. And Citus's columnar
engine works
<a href="https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-compression-for-postgres/#:~:text=What%20About%20cstore_fdw%3F">either</a>
as a foreign data wrapper or a table access method.</p>
<p>The Citus page draws the clearest distinction between FDWs and table
access methods, but even that page is vague. Performance doesn't seem
to be the main difference. Closer integration, and thus the ability to
look more like vanilla Postgres from the outside, seems to be the
gist.</p>
<p>In any case, I wanted to explore the table access method API.</p>
<h3 id="digging-in">Digging in</h3><p>I haven't written Postgres extensions before and I've never written C
professionally. If you're familiar with Postgres internals or C and
notice something funky, please <a href="mailto:[email protected]">let me know</a>!</p>
<p>It turns out that almost no one has written how to implement the
minimal table access methods for various storage engine operations. So
after quite a bit of stumbling to get the basics of an in-memory
storage engine working, I'm going to walk you through my approach.</p>
<p>This is prototype-quality code which hopefully will be a useful base
for further exploration.</p>
<p>All code for this post is <a href="https://github.com/eatonphil/pgtam">available on
GitHub</a>.</p>
<h3 id="a-debug-postgres-build">A debug Postgres build</h3><p>First off, let's make a <a href="https://wiki.postgresql.org/wiki/Developer_FAQ#Compile-time">debug
build</a> of
Postgres.</p>
<div class="highlight"><pre><span></span><span class="n">$</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="k">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">postgres</span><span class="o">/</span><span class="n">postgres</span>
<span class="n">$</span><span class="w"> </span><span class="c1"># An arbitrary commit from `master` after Postgres 16 I am on</span>
<span class="n">$</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">checkout</span><span class="w"> </span><span class="n">849172ff4883d44168f96f39d3fde96d0aa34c99</span>
<span class="n">$</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">postgres</span>
<span class="n">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">configure</span><span class="w"> </span><span class="o">--</span><span class="k">enable</span><span class="o">-</span><span class="n">cassert</span><span class="w"> </span><span class="o">--</span><span class="k">enable</span><span class="o">-</span><span class="n">debug</span><span class="w"> </span><span class="n">CFLAGS</span><span class="o">=</span><span class="s2">"-ggdb -Og -g3 -fno-omit-frame-pointer"</span>
<span class="n">$</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="o">-</span><span class="n">j8</span>
<span class="n">$</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="k">install</span>
</pre></div>
<p>This will install Postgres binaries (e.g. <code>psql</code>, <code>pg_ctl</code>, <code>initdb</code>,
<code>pg_config</code>) into <code>/usr/local/pgsql/bin</code>.</p>
<p>I'm going to reference those absolute paths throughout this post
because you might have a system (package manager) install of Postgres
already.</p>
<p>Let's create a database and start up this debug build:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/initdb<span class="w"> </span>test-db
<span class="gp">$ </span>/usr/local/pgsql/bin/pg_ctl<span class="w"> </span>-D<span class="w"> </span>test-db<span class="w"> </span>-l<span class="w"> </span>logfile<span class="w"> </span>start
</pre></div>
<h3 id="extension-infrastructure">Extension infrastructure</h3><p>Since we installed Postgres from scratch,
<code>/usr/local/pgsql/bin/pg_config</code> will supply all of the infrastructure
we need.</p>
<p>The "infrastructure" is basically just
<a href="https://www.postgresql.org/docs/current/extend-pgxs.html">PGXS</a>:
Postgres Makefile utilities.</p>
<p>It's convention-heavy. So in a new <code>Makefile</code> for this project we'll
specify:</p>
<ol>
<li><code>MODULES</code>: Any C sources to build, without the <code>.c</code> file extension</li>
<li><code>EXTENSION</code>: Extension metadata file, without the <code>.control</code> file extension</li>
<li><code>DATA</code>: A SQL file that is executed when the extension is loaded, this time with the <code>.sql</code> extension</li>
</ol>
<div class="highlight"><pre><span></span><span class="nv">MODULES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgtam
<span class="nv">EXTENSION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgtam
<span class="nv">DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgtam--0.0.1.sql
<span class="nv">PG_CONFIG</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>/usr/local/pgsql/bin/pg_config
<span class="nv">PGXS</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">$(</span>shell<span class="w"> </span><span class="k">$(</span>PG_CONFIG<span class="k">)</span><span class="w"> </span>--pgxs<span class="k">)</span>
<span class="cp">include $(PGXS)</span>
</pre></div>
<p>The final three lines set up the PGXS Makefile library based on the
particular installed Postgres build we want to build the extension
against and install the extension to.</p>
<p>PGXS gives us a few important targets like <code>make distclean</code>, <code>make</code>,
and <code>make install</code> we'll use later on.</p>
<h4 id="<code>pgtam.c</code>"><code>pgtam.c</code></h4><p>A minimal C file that registers a function capable of serving as a
table access method is:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"postgres.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"fmgr.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"access/tableam.h"</span>
<span class="n">PG_MODULE_MAGIC</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="n">TableAmRoutine</span><span class="w"> </span><span class="n">memam_methods</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">T_TableAmRoutine</span><span class="p">,</span>
<span class="p">};</span>
<span class="n">PG_FUNCTION_INFO_V1</span><span class="p">(</span><span class="n">mem_tableam_handler</span><span class="p">);</span>
<span class="n">Datum</span><span class="w"> </span><span class="nf">mem_tableam_handler</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">PG_RETURN_POINTER</span><span class="p">(</span><span class="o">&</span><span class="n">memam_methods</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p class="note">
If you want to read about extension basics without the complexity of
table access methods, you can find a complete, minimal Postgres
extension I wrote to validate the
infrastructure <a href="https://github.com/eatonphil/pgext-101">here</a>. Or
you can follow a
<a href="https://github.com/IshaanAdarsh/Postgres-extension-tutorial/blob/main/SGML/intro_and_toc.md">larger
tutorial</a>.
</p><p>The workflow for registering a table access method is to first run
<code>CREATE EXTENSION pgtam</code>. This assumes <code>pgtam</code> is an extension that
has a function that returns a <code>TableAmRoutine</code> struct instance, a
table of table access methods.</p>
<p>Then you must run <code>CREATE ACCESS METHOD mem TYPE TABLE HANDLER
mem_tableam_handler</code>. And finally you can use the access method when
creating a table with <code>USING mem</code>: <code>CREATE TABLE x(a INT) USING mem</code>.</p>
<h4 id="<code>pgtam.control</code>"><code>pgtam.control</code></h4><p>This file contains extension metadata. At a minimum, the version of
the extension and the filename for the extension where it should be
installed.</p>
<div class="highlight"><pre><span></span>default_version = '0.0.1'
module_pathname = '$libdir/pgtam'
</pre></div>
<h4 id="<code>pgtam--0.0.1.sql</code>"><code>pgtam--0.0.1.sql</code></h4><p>Finally, in <code>pgtam--0.0.1.sql</code> (which is executed when we call <code>CREATE
EXTENSION pgtam</code>), we register the handler function as a foreign
function, and then we register the function as an access method.</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">mem_tableam_handler</span><span class="p">(</span><span class="n">internal</span><span class="p">)</span>
<span class="k">RETURNS</span><span class="w"> </span><span class="n">table_am_handler</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s1">'pgtam'</span><span class="p">,</span><span class="w"> </span><span class="s1">'mem_tableam_handler'</span>
<span class="k">LANGUAGE</span><span class="w"> </span><span class="k">C</span><span class="w"> </span><span class="k">STRICT</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">ACCESS</span><span class="w"> </span><span class="k">METHOD</span><span class="w"> </span><span class="n">mem</span><span class="w"> </span><span class="k">TYPE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">HANDLER</span><span class="w"> </span><span class="n">mem_tableam_handler</span><span class="p">;</span>
</pre></div>
<h4 id="build">Build</h4><p>Now that we've got all the pieces in place, we can build and install
the extension.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>make
$<span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
</pre></div>
<p>Let's add a <code>test.sql</code> script to exercise the extension:</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">pgtam</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pgtam</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">x</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span>
</pre></div>
<p>And run it:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">psql:test.sql:3: server closed the connection unexpectedly</span>
<span class="go"> This probably means the server terminated abnormally</span>
<span class="go"> before or while processing the request.</span>
<span class="go">psql:test.sql:3: error: connection to server was lost</span>
</pre></div>
<p>Ok, so <code>psql</code> crashed! Let's look at the server logs. When we started
Postgres with <code>pg_ctl</code> we specified the log file as <code>logfile</code> in the
directory where we ran <code>pg_ctl</code>.</p>
<p>If we look through it we'll spot an assertion failure:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span>Assert<span class="w"> </span>logfile
<span class="go">TRAP: failed Assert("routine->scan_begin != NULL"), File: "tableamapi.c", Line: 52, PID: 2906922</span>
</pre></div>
<p>That's a great sign! This is Postgres's debug infrastructure helping
to make sure the table access method is correctly implemented.</p>
<h3 id="table-access-method-stubs">Table access method stubs</h3><p>The next step is to add function stubs for all the non-optional
methods of the <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/access/tableam.h#L282"><code>TableAmRoutine</code>
struct</a>.</p>
<p>I've done all the work for you already so you can just copy this over
the existing <code>pgtam.c</code>. It's a big file, but don't worry. There's
nothing to explain. Just a bunch of blank functions returning default
values when required.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"postgres.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"fmgr.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"access/tableam.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"access/heapam.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"nodes/execnodes.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"catalog/index.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"commands/vacuum.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"utils/builtins.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"executor/tuptable.h"</span>
<span class="n">PG_MODULE_MAGIC</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="n">TableAmRoutine</span><span class="w"> </span><span class="n">memam_methods</span><span class="p">;</span>
<span class="k">static</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">TupleTableSlotOps</span><span class="o">*</span><span class="w"> </span><span class="nf">memam_slot_callbacks</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="nf">memam_beginscan</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">nkeys</span><span class="p">,</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">ScanKeyData</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span>
<span class="w"> </span><span class="n">ParallelTableScanDesc</span><span class="w"> </span><span class="n">parallel_scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">flags</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_rescan</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">ScanKeyData</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">set_params</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_strat</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_sync</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_pagemode</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_endscan</span><span class="p">(</span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_getnextslot</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span>
<span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">IndexFetchTableData</span><span class="o">*</span><span class="w"> </span><span class="nf">memam_index_fetch_begin</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_index_fetch_reset</span><span class="p">(</span><span class="n">IndexFetchTableData</span><span class="w"> </span><span class="o">*</span><span class="n">scan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_index_fetch_end</span><span class="p">(</span><span class="n">IndexFetchTableData</span><span class="w"> </span><span class="o">*</span><span class="n">scan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_index_fetch_tuple</span><span class="p">(</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">IndexFetchTableData</span><span class="w"> </span><span class="o">*</span><span class="n">scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="o">*</span><span class="n">call_again</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="o">*</span><span class="n">all_dead</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_insert</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span>
<span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_insert_speculative</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span>
<span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span><span class="p">,</span>
<span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">specToken</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_complete_speculative</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">specToken</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">succeeded</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_multi_insert</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">**</span><span class="n">slots</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">ntuples</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span>
<span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="nf">memam_tuple_delete</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">crosscheck</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">wait</span><span class="p">,</span>
<span class="w"> </span><span class="n">TM_FailureData</span><span class="w"> </span><span class="o">*</span><span class="n">tmfd</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">changingPart</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="nf">memam_tuple_update</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">otid</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">crosscheck</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">wait</span><span class="p">,</span>
<span class="w"> </span><span class="n">TM_FailureData</span><span class="w"> </span><span class="o">*</span><span class="n">tmfd</span><span class="p">,</span>
<span class="w"> </span><span class="n">LockTupleMode</span><span class="w"> </span><span class="o">*</span><span class="n">lockmode</span><span class="p">,</span>
<span class="w"> </span><span class="n">TU_UpdateIndexes</span><span class="w"> </span><span class="o">*</span><span class="n">update_indexes</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="nf">memam_tuple_lock</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="n">LockTupleMode</span><span class="w"> </span><span class="n">mode</span><span class="p">,</span>
<span class="w"> </span><span class="n">LockWaitPolicy</span><span class="w"> </span><span class="n">wait_policy</span><span class="p">,</span>
<span class="w"> </span><span class="n">uint8</span><span class="w"> </span><span class="n">flags</span><span class="p">,</span>
<span class="w"> </span><span class="n">TM_FailureData</span><span class="w"> </span><span class="o">*</span><span class="n">tmfd</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_fetch_row_version</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_get_latest_tid</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span>
<span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_tuple_tid_valid</span><span class="p">(</span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span><span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_tuple_satisfies_snapshot</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="nf">memam_index_delete_tuples</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span>
<span class="w"> </span><span class="n">TM_IndexDeleteOp</span><span class="w"> </span><span class="o">*</span><span class="n">delstate</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">id</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_set_new_filelocator</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">RelFileLocator</span><span class="w"> </span><span class="o">*</span><span class="n">newrlocator</span><span class="p">,</span>
<span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">persistence</span><span class="p">,</span>
<span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="o">*</span><span class="n">freezeXid</span><span class="p">,</span>
<span class="w"> </span><span class="n">MultiXactId</span><span class="w"> </span><span class="o">*</span><span class="n">minmulti</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_nontransactional_truncate</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_copy_data</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">RelFileLocator</span><span class="w"> </span><span class="o">*</span><span class="n">newrlocator</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_copy_for_cluster</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">OldHeap</span><span class="p">,</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">NewHeap</span><span class="p">,</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">OldIndex</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">use_sort</span><span class="p">,</span>
<span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="n">OldestXmin</span><span class="p">,</span>
<span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="o">*</span><span class="n">xid_cutoff</span><span class="p">,</span>
<span class="w"> </span><span class="n">MultiXactId</span><span class="w"> </span><span class="o">*</span><span class="n">multi_cutoff</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">num_tuples</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">tups_vacuumed</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">tups_recently_dead</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_vacuum_rel</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span>
<span class="w"> </span><span class="n">VacuumParams</span><span class="w"> </span><span class="o">*</span><span class="n">params</span><span class="p">,</span>
<span class="w"> </span><span class="n">BufferAccessStrategy</span><span class="w"> </span><span class="n">bstrategy</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_analyze_next_block</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="n">blockno</span><span class="p">,</span>
<span class="w"> </span><span class="n">BufferAccessStrategy</span><span class="w"> </span><span class="n">bstrategy</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_analyze_next_tuple</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="n">OldestXmin</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">liverows</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">deadrows</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="nf">memam_index_build_range_scan</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">heapRelation</span><span class="p">,</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">indexRelation</span><span class="p">,</span>
<span class="w"> </span><span class="n">IndexInfo</span><span class="w"> </span><span class="o">*</span><span class="n">indexInfo</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_sync</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">anyvisible</span><span class="p">,</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">progress</span><span class="p">,</span>
<span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="n">start_blockno</span><span class="p">,</span>
<span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="n">numblocks</span><span class="p">,</span>
<span class="w"> </span><span class="n">IndexBuildCallback</span><span class="w"> </span><span class="n">callback</span><span class="p">,</span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">callback_state</span><span class="p">,</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_index_validate_scan</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">heapRelation</span><span class="p">,</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">indexRelation</span><span class="p">,</span>
<span class="w"> </span><span class="n">IndexInfo</span><span class="w"> </span><span class="o">*</span><span class="n">indexInfo</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="n">ValidateIndexState</span><span class="w"> </span><span class="o">*</span><span class="n">state</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_relation_needs_toast_table</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="nf">memam_relation_toast_am</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">oid</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_fetch_toast_slice</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">toastrel</span><span class="p">,</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">valueid</span><span class="p">,</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">attrsize</span><span class="p">,</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">sliceoffset</span><span class="p">,</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">slicelength</span><span class="p">,</span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">varlena</span><span class="w"> </span><span class="o">*</span><span class="n">result</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_estimate_rel_size</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="o">*</span><span class="n">attr_widths</span><span class="p">,</span>
<span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="o">*</span><span class="n">pages</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">tuples</span><span class="p">,</span>
<span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">allvisfrac</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_sample_next_block</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span><span class="w"> </span><span class="n">SampleScanState</span><span class="w"> </span><span class="o">*</span><span class="n">scanstate</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_sample_next_tuple</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">SampleScanState</span><span class="w"> </span><span class="o">*</span><span class="n">scanstate</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">const</span><span class="w"> </span><span class="n">TableAmRoutine</span><span class="w"> </span><span class="n">memam_methods</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">T_TableAmRoutine</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">slot_callbacks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_slot_callbacks</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_begin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_beginscan</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_endscan</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_rescan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_rescan</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_getnextslot</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_getnextslot</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">parallelscan_estimate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_parallelscan_estimate</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">parallelscan_initialize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_parallelscan_initialize</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">parallelscan_reinitialize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_parallelscan_reinitialize</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_fetch_begin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_begin</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_fetch_reset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_reset</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_fetch_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_end</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_fetch_tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_tuple</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_insert</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_insert_speculative</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_insert_speculative</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_complete_speculative</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_complete_speculative</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">multi_insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_multi_insert</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_delete</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_delete</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_update</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_update</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_lock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_lock</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_fetch_row_version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_fetch_row_version</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_get_latest_tid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_get_latest_tid</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_tid_valid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_tid_valid</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">tuple_satisfies_snapshot</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_satisfies_snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_delete_tuples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_delete_tuples</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_set_new_filelocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_nontransactional_truncate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_nontransactional_truncate</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_copy_data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_copy_data</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_copy_for_cluster</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_copy_for_cluster</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_vacuum</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_vacuum_rel</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_analyze_next_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_analyze_next_block</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_analyze_next_tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_analyze_next_tuple</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_build_range_scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_build_range_scan</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index_validate_scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_validate_scan</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_relation_size</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_needs_toast_table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_toast_am</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_toast_am</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_fetch_toast_slice</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_fetch_toast_slice</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">relation_estimate_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_sample_next_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_sample_next_block</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">scan_sample_next_tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_sample_next_tuple</span>
<span class="p">};</span>
<span class="n">PG_FUNCTION_INFO_V1</span><span class="p">(</span><span class="n">mem_tableam_handler</span><span class="p">);</span>
<span class="n">Datum</span><span class="w"> </span><span class="nf">mem_tableam_handler</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">PG_RETURN_POINTER</span><span class="p">(</span><span class="o">&</span><span class="n">memam_methods</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Let's build and test it!</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
</pre></div>
<p>Hey we're getting somewhere! It successfully created the table with
our custom table access method.</p>
<h3 id="querying-rows">Querying rows</h3><p>Next, let's try querying the table by adding a <code>SELECT a FROM x</code> to
<code>test.sql</code> and running it:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">psql:test.sql:6: server closed the connection unexpectedly</span>
<span class="go"> This probably means the server terminated abnormally</span>
<span class="go"> before or while processing the request.</span>
<span class="go">psql:test.sql:6: error: connection to server was lost</span>
</pre></div>
<p>This time there's nothing in <code>logfile</code> that helps:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>tail<span class="w"> </span>-n15<span class="w"> </span>logfile
<span class="go">2023-11-01 18:43:32.449 UTC [2906199] LOG: database system is ready to accept connections</span>
<span class="go">2023-11-01 18:58:32.572 UTC [2907997] LOG: checkpoint starting: time</span>
<span class="go">2023-11-01 18:58:35.305 UTC [2907997] LOG: checkpoint complete: wrote 28 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.712 s, sync=0.015 s, total=2.733 s; sync files=23, longest=0.004 s, average=0.001 s; distance=128 kB, estimate=150 kB; lsn=0/15F88E0, redo lsn=0/15F8888</span>
<span class="go">2023-11-01 19:08:14.485 UTC [2906199] LOG: server process (PID 2908242) was terminated by signal 11: Segmentation fault</span>
<span class="go">2023-11-01 19:08:14.485 UTC [2906199] DETAIL: Failed process was running: SELECT a FROM x;</span>
<span class="go">2023-11-01 19:08:14.485 UTC [2906199] LOG: terminating any other active server processes</span>
<span class="go">2023-11-01 19:08:14.486 UTC [2906199] LOG: all server processes terminated; reinitializing</span>
<span class="go">2023-11-01 19:08:14.508 UTC [2908253] LOG: database system was interrupted; last known up at 2023-11-01 18:58:35 UTC</span>
<span class="go">2023-11-01 19:08:14.518 UTC [2908253] LOG: database system was not properly shut down; automatic recovery in progress</span>
<span class="go">2023-11-01 19:08:14.519 UTC [2908253] LOG: redo starts at 0/15F8888</span>
<span class="go">2023-11-01 19:08:14.520 UTC [2908253] LOG: invalid record length at 0/161DE70: expected at least 24, got 0</span>
<span class="go">2023-11-01 19:08:14.520 UTC [2908253] LOG: redo done at 0/161DE38 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s</span>
<span class="go">2023-11-01 19:08:14.521 UTC [2908254] LOG: checkpoint starting: end-of-recovery immediate wait</span>
<span class="go">2023-11-01 19:08:14.532 UTC [2908254] LOG: checkpoint complete: wrote 35 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.010 s, total=0.012 s; sync files=27, longest=0.003 s, average=0.001 s; distance=149 kB, estimate=149 kB; lsn=0/161DE70, redo lsn=0/161DE70</span>
<span class="go">2023-11-01 19:08:14.533 UTC [2906199] LOG: database system is ready to accept connections</span>
</pre></div>
<p>This was the first place I got stuck. How on earth do I figure out
what methods to implement? I mean, it's clearly one or more of these
methods from the struct. But there are so many methods.</p>
<p>I tried setting a breakpoint in <code>gdb</code> on the process returned by
<code>SELECT pg_backend_pid()</code> for a <code>psql</code> session, but the breakpoint
never seemed to be hit for any of my methods.</p>
<p>So I did the low-tech solution and opened a file, <code>/tmp/pgtam.log</code>,
turned off buffering on it, and added a log to every method on the
<code>TableAmRoutine</code> struct:</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -12,9 +12,13 @@</span>
<span class="w"> </span>const TableAmRoutine memam_methods;
<span class="gi">+FILE* fd;</span>
<span class="gi">+#define DEBUG_FUNC() fprintf(fd, "in %s\n", __func__);</span>
<span class="gi">+</span>
<span class="w"> </span>static const TupleTableSlotOps* memam_slot_callbacks(
<span class="w"> </span> Relation relation
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return NULL;
<span class="w"> </span>}
<span class="gu">@@ -26,6 +30,7 @@</span>
<span class="w"> </span> ParallelTableScanDesc parallel_scan,
<span class="w"> </span> uint32 flags
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return NULL;
<span class="w"> </span>}
<span class="gu">@@ -37,9 +42,11 @@</span>
<span class="w"> </span> bool allow_sync,
<span class="w"> </span> bool allow_pagemode
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_endscan(TableScanDesc sscan) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static bool memam_getnextslot(
<span class="gu">@@ -47,17 +54,21 @@</span>
<span class="w"> </span> ScanDirection direction,
<span class="w"> </span> TupleTableSlot *slot
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="w"> </span>static IndexFetchTableData* memam_index_fetch_begin(Relation rel) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return NULL;
<span class="w"> </span>}
<span class="w"> </span>static void memam_index_fetch_reset(IndexFetchTableData *scan) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_index_fetch_end(IndexFetchTableData *scan) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static bool memam_index_fetch_tuple(
<span class="gu">@@ -68,6 +79,7 @@</span>
<span class="w"> </span> bool *call_again,
<span class="w"> </span> bool *all_dead
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -78,6 +90,7 @@</span>
<span class="w"> </span> int options,
<span class="w"> </span> BulkInsertState bistate
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_tuple_insert_speculative(
<span class="gu">@@ -87,6 +100,7 @@</span>
<span class="w"> </span> int options,
<span class="w"> </span> BulkInsertState bistate,
<span class="w"> </span> uint32 specToken) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_tuple_complete_speculative(
<span class="gu">@@ -94,6 +108,7 @@</span>
<span class="w"> </span> TupleTableSlot *slot,
<span class="w"> </span> uint32 specToken,
<span class="w"> </span> bool succeeded) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_multi_insert(
<span class="gu">@@ -104,6 +119,7 @@</span>
<span class="w"> </span> int options,
<span class="w"> </span> BulkInsertState bistate
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static TM_Result memam_tuple_delete(
<span class="gu">@@ -117,6 +133,7 @@</span>
<span class="w"> </span> bool changingPart
<span class="w"> </span>) {
<span class="w"> </span> TM_Result result = {};
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return result;
<span class="w"> </span>}
<span class="gu">@@ -133,6 +150,7 @@</span>
<span class="w"> </span> TU_UpdateIndexes *update_indexes
<span class="w"> </span>) {
<span class="w"> </span> TM_Result result = {};
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return result;
<span class="w"> </span>}
<span class="gu">@@ -148,6 +166,7 @@</span>
<span class="w"> </span> TM_FailureData *tmfd)
<span class="w"> </span>{
<span class="w"> </span> TM_Result result = {};
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return result;
<span class="w"> </span>}
<span class="gu">@@ -157,6 +176,7 @@</span>
<span class="w"> </span> Snapshot snapshot,
<span class="w"> </span> TupleTableSlot *slot
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -164,9 +184,11 @@</span>
<span class="w"> </span> TableScanDesc sscan,
<span class="w"> </span> ItemPointer tid
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static bool memam_tuple_tid_valid(TableScanDesc scan, ItemPointer tid) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -175,6 +197,7 @@</span>
<span class="w"> </span> TupleTableSlot *slot,
<span class="w"> </span> Snapshot snapshot
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -183,6 +206,7 @@</span>
<span class="w"> </span> TM_IndexDeleteOp *delstate
<span class="w"> </span>) {
<span class="w"> </span> TransactionId id = {};
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return id;
<span class="w"> </span>}
<span class="gu">@@ -193,17 +217,20 @@</span>
<span class="w"> </span> TransactionId *freezeXid,
<span class="w"> </span> MultiXactId *minmulti
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_relation_nontransactional_truncate(
<span class="w"> </span> Relation rel
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_relation_copy_data(
<span class="w"> </span> Relation rel,
<span class="w"> </span> const RelFileLocator *newrlocator
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_relation_copy_for_cluster(
<span class="gu">@@ -218,6 +245,7 @@</span>
<span class="w"> </span> double *tups_vacuumed,
<span class="w"> </span> double *tups_recently_dead
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_vacuum_rel(
<span class="gu">@@ -225,6 +253,7 @@</span>
<span class="w"> </span> VacuumParams *params,
<span class="w"> </span> BufferAccessStrategy bstrategy
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static bool memam_scan_analyze_next_block(
<span class="gu">@@ -232,6 +261,7 @@</span>
<span class="w"> </span> BlockNumber blockno,
<span class="w"> </span> BufferAccessStrategy bstrategy
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -242,6 +272,7 @@</span>
<span class="w"> </span> double *deadrows,
<span class="w"> </span> TupleTableSlot *slot
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -258,6 +289,7 @@</span>
<span class="w"> </span> void *callback_state,
<span class="w"> </span> TableScanDesc scan
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return 0;
<span class="w"> </span>}
<span class="gu">@@ -268,14 +300,17 @@</span>
<span class="w"> </span> Snapshot snapshot,
<span class="w"> </span> ValidateIndexState *state
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static bool memam_relation_needs_toast_table(Relation rel) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="w"> </span>static Oid memam_relation_toast_am(Relation rel) {
<span class="w"> </span> Oid oid = {};
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return oid;
<span class="w"> </span>}
<span class="gu">@@ -287,6 +322,7 @@</span>
<span class="w"> </span> int32 slicelength,
<span class="w"> </span> struct varlena *result
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_estimate_rel_size(
<span class="gu">@@ -296,11 +332,13 @@</span>
<span class="w"> </span> double *tuples,
<span class="w"> </span> double *allvisfrac
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span>}
<span class="w"> </span>static bool memam_scan_sample_next_block(
<span class="w"> </span> TableScanDesc scan, SampleScanState *scanstate
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
<span class="gu">@@ -309,6 +347,7 @@</span>
<span class="w"> </span> SampleScanState *scanstate,
<span class="w"> </span> TupleTableSlot *slot
<span class="w"> </span>) {
<span class="gi">+ DEBUG_FUNC();</span>
<span class="w"> </span> return false;
<span class="w"> </span>}
</pre></div>
<p>And then in the entrypoint, initialize the file for logging.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -369,5 +408,9 @@</span>
<span class="w"> </span>PG_FUNCTION_INFO_V1(mem_tableam_handler);
<span class="w"> </span>Datum mem_tableam_handler(PG_FUNCTION_ARGS) {
<span class="gi">+ fd = fopen("/tmp/pgtam.log", "a");</span>
<span class="gi">+ setvbuf(fd, NULL, _IONBF, 0); // Prevent buffering</span>
<span class="gi">+ fprintf(fd, "\n\nmem_tableam handler loaded\n");</span>
<span class="gi">+</span>
<span class="w"> </span> PG_RETURN_POINTER(&memam_methods);
<span class="w"> </span>}
</pre></div>
<p>Let's give it a shot!</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">psql:test.sql:6: server closed the connection unexpectedly</span>
<span class="go"> This probably means the server terminated abnormally</span>
<span class="go"> before or while processing the request.</span>
<span class="go">psql:test.sql:6: error: connection to server was lost</span>
</pre></div>
<p>And let's check our log file:</p>
<div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">pgtam</span><span class="o">.</span><span class="n">log</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_slot_callbacks</span>
</pre></div>
<p>Now we're getting somewhere!</p>
<p class="note">
I later realized <code>elog()</code> is the way most people log
within Postgres/within extensions. I didn't know that when I was
getting started though. This separate logging was a simple way to
get the info out.
</p><h4 id="<code>slot_callbacks</code>"><code>slot_callbacks</code></h4><p>Since the request crashes and the last logged function is
<code>memam_slot_callbacks</code>, it seems like that is where we should
concentrate. The <a href="https://www.postgresql.org/docs/current/tableam.html">table access method
docs</a> suggest
looking at the default <code>heap</code> access method for inspiration.</p>
<p>Its
<a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/backend/access/heap/heapam_handler.c#L67">version</a>
of <code>slot_callbacks</code> returns <code>&TTSOpsBufferHeapTuple</code>:</p>
<div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">TupleTableSlotOps</span><span class="w"> </span><span class="o">*</span>
<span class="nf">heapam_slot_callbacks</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="n">TTSOpsBufferHeapTuple</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>I have no idea what that means, but since it is defined in
<a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/backend/executor/execTuples.c#L1080"><code>src/backend/executor/execTuples.c</code></a>
it doesn't seem to be tied to the <code>heap</code> access method
implementation. Let's try it.</p>
<p class="note">
While it works initially, I noticed later on that
<code>TTSOpsBufferHeapTuple</code> turns out not to be the right
choice here. <code>TTSOpsVirtual</code> seems to be the right
implementation.
</p><div class="highlight"><pre><span></span><span class="gu">@@ -19,7 +19,7 @@</span>
<span class="w"> </span> Relation relation
<span class="w"> </span>) {
<span class="w"> </span> DEBUG_FUNC();
<span class="gd">- return NULL;</span>
<span class="gi">+ return &TTSOpsVirtual;</span>
<span class="w"> </span>}
<span class="w"> </span>static TableScanDesc memam_beginscan(
</pre></div>
<p>Build and run:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">psql:test.sql:6: server closed the connection unexpectedly</span>
<span class="go"> This probably means the server terminated abnormally</span>
<span class="go"> before or while processing the request.</span>
<span class="go">psql:test.sql:6: error: connection to server was lost</span>
</pre></div>
<p>It still crashes. But this time in <code>/tmp/pgtam.log</code> we made it into a
new method!</p>
<div class="highlight"><pre><span></span><span class="n">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">pgtam</span><span class="p">.</span><span class="n">log</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="n">in</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="n">in</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="n">in</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span>
<span class="n">in</span><span class="w"> </span><span class="n">memam_slot_callbacks</span>
<span class="n">in</span><span class="w"> </span><span class="n">memam_beginscan</span>
</pre></div>
<h4 id="<code>scan_begin</code>"><code>scan_begin</code></h4><p>The function signature is:</p>
<div class="highlight"><pre><span></span><span class="n">TableScanDesc</span><span class="w"> </span><span class="nf">heap_beginscan</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">nkeys</span><span class="p">,</span>
<span class="w"> </span><span class="n">ScanKey</span><span class="w"> </span><span class="n">key</span><span class="p">,</span>
<span class="w"> </span><span class="n">ParallelTableScanDesc</span><span class="w"> </span><span class="n">parallel_scan</span><span class="p">,</span>
<span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">flags</span>
<span class="p">);</span>
</pre></div>
<p>Since we just implemented stub versions of all the methods, we've been
returning <code>NULL</code>. Since we're failing in this function, maybe we
should try returning something that isn't <code>NULL</code>.</p>
<p>By looking at the definition of <code>TableScanDesc</code>, we can see it is a
pointer to the <code>TableScanDescData</code> struct defined in
<a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/access/relscan.h#L52"><code>src/include/access/relscan.h</code></a>.</p>
<p>Let's <code>malloc</code> a <code>TableScanDescData</code>, free it in <code>endscan</code>, and return
the <code>TableScanDescData</code> instance in <code>beginscan</code>:</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -30,8 +30,12 @@</span>
<span class="w"> </span> ParallelTableScanDesc parallel_scan,
<span class="w"> </span> uint32 flags
<span class="w"> </span>) {
<span class="gi">+ TableScanDescData* scan = {};</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="gd">- return NULL;</span>
<span class="gi">+</span>
<span class="gi">+ scan = (TableScanDescData*)malloc(sizeof(TableScanDescData));</span>
<span class="gi">+</span>
<span class="gi">+ return (TableScanDesc)scan;</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_rescan(
<span class="gu">@@ -87,6 +87,7 @@</span>
<span class="w"> </span>static void memam_endscan(TableScanDesc sscan) {
<span class="w"> </span> DEBUG_FUNC();
<span class="gi">+ free(sscan);</span>
<span class="w"> </span>}
</pre></div>
<p>Build and run (you can do it on your own). No difference.</p>
<p>I got stuck for a while here too. Clearly something must be filled out
in this struct but it could be anything. Through trial and error I
realized the one field that must be filled out is <code>scan->rs_rd</code>.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -34,6 +34,7 @@</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="w"> </span> scan = (TableScanDescData*)malloc(sizeof(TableScanDescData));
<span class="gi">+ scan->rs_rd = relation;</span>
<span class="w"> </span> return (TableScanDesc)scan;
<span class="w"> </span>}
</pre></div>
<p>We build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
$<span class="w"> </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
psql:test.sql:1:<span class="w"> </span>NOTICE:<span class="w"> </span>drop<span class="w"> </span>cascades<span class="w"> </span>to<span class="w"> </span>table<span class="w"> </span>x
DROP<span class="w"> </span>EXTENSION
CREATE<span class="w"> </span>EXTENSION
CREATE<span class="w"> </span>TABLE
<span class="w"> </span>a
---
<span class="o">(</span><span class="m">0</span><span class="w"> </span>rows<span class="o">)</span>
</pre></div>
<p>And it works! It doesn't return anything but that's correct. There's
nothing to return.</p>
<p>So what if we actually want to return something? Let's check our logs
in <code>/tmp/pgtam.log</code>.</p>
<div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">pgtam</span><span class="o">.</span><span class="n">log</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span>
<span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_slot_callbacks</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_beginscan</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_getnextslot</span>
<span class="ow">in</span><span class="w"> </span><span class="n">memam_endscan</span>
</pre></div>
<p>Ok, I'm getting the gist of the API. A full table scan (which this is,
because there are no indexes at play) starts with an initialization
for a slot, then the scan begins, then <code>getnextslot</code> is called for
each row, and then <code>endscan</code> is called to allow for cleanup.</p>
<p>So let's try returning a row in <code>getnextslot</code>.</p>
<h4 id="<code>getnextslot</code>"><code>getnextslot</code></h4><p>The <code>getnextslot</code> signature is:</p>
<div class="highlight"><pre><span></span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_getnextslot</span><span class="p">(</span>
<span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span>
<span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span>
<span class="p">);</span>
</pre></div>
<p>So the <code>sscan</code> should be what we returned from <code>beginscan</code> and the
<a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/access/tableam.h#L341">interface
docs</a>
say the current row gets stored in <code>slot</code>.</p>
<p class="note">
The return value seems to indicate whether or not we've reached the
end of the scan. However, the scan will still end even if you
<code>return true</code> if the <code>slot</code> is not filled out correctly. If the
<code>slot</code> is filled out correctly and you unconditionally <code>return
true</code>, you will crash the process.
</p><p>Let's take a look at the
<a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/executor/tuptable.h#L114">definition</a>
of <code>TupleTableSlot</code>:</p>
<div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">TupleTableSlot</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
<span class="cp">#define FIELDNO_TUPLETABLESLOT_FLAGS 1</span>
<span class="w"> </span><span class="n">uint16</span><span class="w"> </span><span class="n">tts_flags</span><span class="p">;</span><span class="w"> </span><span class="cm">/* Boolean states */</span>
<span class="cp">#define FIELDNO_TUPLETABLESLOT_NVALID 2</span>
<span class="w"> </span><span class="n">AttrNumber</span><span class="w"> </span><span class="n">tts_nvalid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* # of valid values in tts_values */</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">TupleTableSlotOps</span><span class="w"> </span><span class="o">*</span><span class="k">const</span><span class="w"> </span><span class="n">tts_ops</span><span class="p">;</span><span class="w"> </span><span class="cm">/* implementation of slot */</span>
<span class="cp">#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4</span>
<span class="w"> </span><span class="n">TupleDesc</span><span class="w"> </span><span class="n">tts_tupleDescriptor</span><span class="p">;</span><span class="w"> </span><span class="cm">/* slot's tuple descriptor */</span>
<span class="cp">#define FIELDNO_TUPLETABLESLOT_VALUES 5</span>
<span class="w"> </span><span class="n">Datum</span><span class="w"> </span><span class="o">*</span><span class="n">tts_values</span><span class="p">;</span><span class="w"> </span><span class="cm">/* current per-attribute values */</span>
<span class="cp">#define FIELDNO_TUPLETABLESLOT_ISNULL 6</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="o">*</span><span class="n">tts_isnull</span><span class="p">;</span><span class="w"> </span><span class="cm">/* current per-attribute isnull flags */</span>
<span class="w"> </span><span class="n">MemoryContext</span><span class="w"> </span><span class="n">tts_mcxt</span><span class="p">;</span><span class="w"> </span><span class="cm">/* slot itself is in this context */</span>
<span class="w"> </span><span class="n">ItemPointerData</span><span class="w"> </span><span class="n">tts_tid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* stored tuple's tid */</span>
<span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">tts_tableOid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* table oid of tuple */</span>
<span class="p">}</span><span class="w"> </span><span class="n">TupleTableSlot</span><span class="p">;</span>
</pre></div>
<p><code>tts_values</code> is an array of <code>Datum</code> (which is a Postgres value). So
that sounds like the actual values of the row. The <code>tts_isnull</code> field
also looks important since that seems to be whether each value in the
row is null or not. And <code>tts_nvalid</code> sounds important too since
presumably it's the length of the <code>tts_isnull</code> and <code>tts_values</code>
arrays.</p>
<p>The rest of it may or may not be important. Let's try filling out
these three fields though and see what happens.</p>
<h4 id="datum">Datum</h4><p>Back in the <a href="https://www.postgresql.org/docs/current/xfunc-c.html">Postgres C extension
documentation</a>,
we can see some simple examples of converting between C types and
Postgres's Datum type.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span><span class="n">Datum</span>
<span class="nf">add_one</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span>
<span class="p">{</span>
<span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">arg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PG_GETARG_INT32</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">PG_RETURN_INT32</span><span class="p">(</span><span class="n">arg</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>If we look at the definition of <code>PG_RETURN_INT32</code> in
<a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/fmgr.h#L354"><code>src/include/fmgr.h</code></a>,
we see:</p>
<div class="highlight"><pre><span></span><span class="cp">#define PG_RETURN_INT32(x) return Int32GetDatum(x)</span>
</pre></div>
<p>So <code>Int32GetDatum()</code> is the function we'll use to set a <code>Datum</code> for a
cell in a row.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -54,13 +54,26 @@</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="w"> </span>}
<span class="gi">+static bool done = false;</span>
<span class="w"> </span>static bool memam_getnextslot(
<span class="w"> </span> TableScanDesc sscan,
<span class="w"> </span> ScanDirection direction,
<span class="w"> </span> TupleTableSlot *slot
<span class="w"> </span>) {
<span class="w"> </span> DEBUG_FUNC();
<span class="gd">- return false;</span>
<span class="gi">+</span>
<span class="gi">+ if (done) {</span>
<span class="gi">+ return false;</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ slot->tts_nvalid = 1;</span>
<span class="gi">+ slot->tts_values = (Datum*)malloc(sizeof(Datum) * slot->tts_nvalid);</span>
<span class="gi">+ slot->tts_values[0] = Int32GetDatum(314 /* Some unique-looking value */);</span>
<span class="gi">+ slot->tts_isnull = (bool*)malloc(sizeof(bool) * slot->tts_nvalid);</span>
<span class="gi">+ slot->tts_isnull[0] = false;</span>
<span class="gi">+ done = true;</span>
<span class="gi">+</span>
<span class="gi">+ return true;</span>
<span class="w"> </span>}
<span class="w"> </span>static IndexFetchTableData* memam_index_fetch_begin(Relation rel) {
</pre></div>
<p>The goal is that we return a single row and then exit the scan. It
will have one 32-bit integer cell (remember we created the table
<code>CREATE TABLE x (a INT)</code>; <code>INT</code> is shorthand for <code>INT4</code> which is a
32-bit integer) that will have the value <code>314</code>.</p>
<p>But if we build and run this, we get no rows.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go"> a</span>
<span class="go">---</span>
<span class="gp gp-VirtualEnv">(0 rows)</span>
</pre></div>
<p>I got stuck for a while here. Plugging my <code>getnextslot</code> code into
ChatGPT helped. One thing it gave me to try was calling
<code>ExecStoreVirtualTuple</code> on the <code>slot</code>. I noticed that the built-in
<code>heap</code> access method <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/backend/access/heap/heapam.c#L1159">also called a function like
this</a>
in <code>getnextslot</code>.</p>
<p>And I realized that <code>tts_nvalid</code> is already set up and the memory for
<code>tts_values</code> and <code>tts_isnull</code> is already allocated. So the code became
a little simpler.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -66,11 +66,9 @@</span>
<span class="w"> </span> return false;
<span class="w"> </span> }
<span class="gd">- slot->tts_nvalid = 1;</span>
<span class="gd">- slot->tts_values = (Datum*)malloc(sizeof(Datum) * slot->tts_nvalid);</span>
<span class="w"> </span> slot->tts_values[0] = Int32GetDatum(314 /* Some unique-looking value */);
<span class="gd">- slot->tts_isnull = (bool*)malloc(sizeof(bool) * slot->tts_nvalid);</span>
<span class="w"> </span> slot->tts_isnull[0] = false;
<span class="gi">+ ExecStoreVirtualTuple(slot);</span>
<span class="w"> </span> done = true;
<span class="w"> </span> return true;
</pre></div>
<p>Build and run:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 314</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
</pre></div>
<p>Fantastic!</p>
<h3 id="creating-a-table">Creating a table</h3><p>Now that we've proven we can return random data, let's set up
infrastructure for storing tables in memory.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -15,6 +15,41 @@</span>
<span class="w"> </span>FILE* fd;
<span class="w"> </span>#define DEBUG_FUNC() fprintf(fd, "in %s\n", __func__);
<span class="gi">+</span>
<span class="gi">+struct Column {</span>
<span class="gi">+ int value;</span>
<span class="gi">+};</span>
<span class="gi">+</span>
<span class="gi">+struct Row {</span>
<span class="gi">+ struct Column* columns;</span>
<span class="gi">+ size_t ncolumns;</span>
<span class="gi">+};</span>
<span class="gi">+</span>
<span class="gi">+#define MAX_ROWS 100</span>
<span class="gi">+struct Table {</span>
<span class="gi">+ char* name;</span>
<span class="gi">+ struct Row* rows;</span>
<span class="gi">+ size_t nrows;</span>
<span class="gi">+};</span>
<span class="gi">+</span>
<span class="gi">+#define MAX_TABLES 100</span>
<span class="gi">+struct Database {</span>
<span class="gi">+ struct Table* tables;</span>
<span class="gi">+ size_t ntables;</span>
<span class="gi">+};</span>
<span class="gi">+</span>
<span class="gi">+struct Database* database;</span>
<span class="gi">+</span>
<span class="gi">+static void get_table(struct Table** table, Relation relation) {</span>
<span class="gi">+ char* this_name = NameStr(relation->rd_rel->relname);</span>
<span class="gi">+ for (size_t i = 0; i < database->ntables; i++) {</span>
<span class="gi">+ if (strcmp(database->tables[i].name, this_name) == 0) {</span>
<span class="gi">+ *table = &database->tables[i];</span>
<span class="gi">+ return;</span>
<span class="gi">+ }</span>
<span class="gi">+ }</span>
<span class="gi">+}</span>
<span class="gi">+</span>
<span class="w"> </span>static const TupleTableSlotOps* memam_slot_callbacks(
<span class="w"> </span> Relation relation
<span class="w"> </span>) {
</pre></div>
<p>Based on what we logged in <code>/tmp/pgtam.log</code> it seems like
<code>memam_relation_set_new_filelocator</code> is called when a new table is
created. So let's handle adding a new table there.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -233,7 +268,16 @@</span>
<span class="w"> </span> TransactionId *freezeXid,
<span class="w"> </span> MultiXactId *minmulti
<span class="w"> </span>) {
<span class="gi">+ struct Table table = {};</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="gi">+</span>
<span class="gi">+ table.name = strdup(NameStr(rel->rd_rel->relname));</span>
<span class="gi">+ fprintf(fd, "Created table: [%s]\n", table.name);</span>
<span class="gi">+ table.rows = (struct Row*)malloc(sizeof(struct Row) * MAX_ROWS);</span>
<span class="gi">+ table.nrows = 0;</span>
<span class="gi">+</span>
<span class="gi">+ database->tables[database->ntables] = table;</span>
<span class="gi">+ database->ntables++;</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_relation_nontransactional_truncate(
</pre></div>
<p>Finally, we'll initialize the in-memory <code>Database*</code> when the handler is
loaded.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -428,5 +472,11 @@</span>
<span class="w"> </span> setvbuf(fd, NULL, _IONBF, 0); // Prevent buffering
<span class="w"> </span> fprintf(fd, "\n\nmem_tableam handler loaded\n");
<span class="gi">+ if (database == NULL) {</span>
<span class="gi">+ database = (struct Database*)malloc(sizeof(struct Database));</span>
<span class="gi">+ database->ntables = 0;</span>
<span class="gi">+ database->tables = (struct Table*)malloc(sizeof(struct Table) * MAX_TABLES);</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="w"> </span> PG_RETURN_POINTER(&memam_methods);
<span class="w"> </span>}
</pre></div>
<p>If we build and run, we won't notice anything new.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 314</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
</pre></div>
<p>But we should see a message in <code>/tmp/pgtam.log</code> about the new table
being created.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>/tmp/pgtam.log
<span class="go">mem_tableam handler loaded</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_relation_set_new_filelocator</span>
<span class="go">Created table: [x]</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_relation_needs_toast_table</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_estimate_rel_size</span>
<span class="go">in memam_slot_callbacks</span>
<span class="go">in memam_beginscan</span>
<span class="go">in memam_getnextslot</span>
<span class="go">in memam_getnextslot</span>
<span class="go">in memam_endscan</span>
</pre></div>
<p>And there it is! Creation looks good.</p>
<h3 id="inserting-rows">Inserting rows</h3><p>Let's add <code>INSERT INTO x VALUES (23), (101);</code> to <code>test.sql</code> and run
the SQL script.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">INSERT 0 2</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 314</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
</pre></div>
<p>And let's check the log to see what method is called when we try to
<code>INSERT</code>.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>/tmp/pgtam.log
<span class="go">mem_tableam handler loaded</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_relation_set_new_filelocator</span>
<span class="go">Created table: [x]</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_relation_needs_toast_table</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_slot_callbacks</span>
<span class="go">in memam_tuple_insert</span>
<span class="go">in memam_tuple_insert</span>
<span class="go">in memam_estimate_rel_size</span>
<span class="go">in memam_slot_callbacks</span>
<span class="go">in memam_beginscan</span>
<span class="go">in memam_getnextslot</span>
<span class="go">in memam_getnextslot</span>
<span class="go">in memam_endscan</span>
</pre></div>
<p><code>tuple_insert</code> seems to be the method! Looks like it gets called once
for each row to insert. Perfect.</p>
<p>The signature for <code>tuple_insert</code> is:</p>
<div class="highlight"><pre><span></span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_insert</span><span class="p">(</span>
<span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span>
<span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span>
<span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span>
<span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span>
<span class="p">);</span>
</pre></div>
<p>We can get the table name from <code>relation</code>, and instead of writing to
<code>slot</code> we can read from <code>slot->tts_values</code> instead.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -141,7 +141,38 @@</span>
<span class="w"> </span> int options,
<span class="w"> </span> BulkInsertState bistate
<span class="w"> </span>) {
<span class="gi">+ TupleDesc desc = RelationGetDescr(relation);</span>
<span class="gi">+ struct Table* table = NULL;</span>
<span class="gi">+ struct Column column = {};</span>
<span class="gi">+ struct Row row = {};</span>
<span class="gi">+</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="gi">+</span>
<span class="gi">+ get_table(&table, relation);</span>
<span class="gi">+ if (table == NULL) {</span>
<span class="gi">+ elog(ERROR, "table not found");</span>
<span class="gi">+ return;</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ if (table->nrows == MAX_ROWS) {</span>
<span class="gi">+ elog(ERROR, "cannot insert more rows");</span>
<span class="gi">+ return;</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ row.ncolumns = desc->natts;</span>
<span class="gi">+ Assert(slot->tts_nvalid == row.ncolumns);</span>
<span class="gi">+ Assert(row.ncolumns > 0);</span>
<span class="gi">+</span>
<span class="gi">+ row.columns = (struct Column*)malloc(sizeof(struct Column) * row.ncolumns);</span>
<span class="gi">+ for (size_t i = 0; i < row.ncolumns; i++) {</span>
<span class="gi">+ Assert(desc->attrs[i].atttypid == INT4OID);</span>
<span class="gi">+ column.value = DatumGetInt32(slot->tts_values[i]);</span>
<span class="gi">+ row.columns[i] = column;</span>
<span class="gi">+ fprintf(fd, "Got value: %d\n", column.value);</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ table->rows[table->nrows] = row;</span>
<span class="gi">+ table->nrows++;</span>
<span class="w"> </span>}
<span class="w"> </span>static void memam_tuple_insert_speculative(
</pre></div>
<p>Build and run and again we won't notice anything new.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">INSERT 0 2</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 314</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
</pre></div>
<p>But if we check the logs, we should see the two column values we
inserted, one for each row.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>/tmp/pgtam.log
<span class="go">mem_tableam handler loaded</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_relation_set_new_filelocator</span>
<span class="go">Created table: [x]</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_relation_needs_toast_table</span>
<span class="go">mem_tableam handler loaded</span>
<span class="go">in memam_slot_callbacks</span>
<span class="go">in memam_tuple_insert</span>
<span class="go">Got value: 23</span>
<span class="go">in memam_tuple_insert</span>
<span class="go">Got value: 101</span>
<span class="go">in memam_estimate_rel_size</span>
<span class="go">in memam_slot_callbacks</span>
<span class="go">in memam_beginscan</span>
<span class="go">in memam_getnextslot</span>
<span class="go">in memam_getnextslot</span>
<span class="go">in memam_endscan</span>
</pre></div>
<p>Woohoo!</p>
<h3 id="un-hardcoding-the-scan">Un-hardcoding the scan</h3><p>The final thing we need to do is drop the hardcoded <code>314</code> we returned
from <code>getnextslot</code> and instead we need to look up the current table
and return rows from it. This also means we need to keep track of
which row we're on. So <code>beginscan</code> will also need to change slightly.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -57,6 +56,14 @@</span>
<span class="w"> </span> return &TTSOpsVirtual;
<span class="w"> </span>}
<span class="gi">+</span>
<span class="gi">+struct MemScanDesc {</span>
<span class="gi">+ TableScanDescData rs_base; // Base class from access/relscan.h.</span>
<span class="gi">+</span>
<span class="gi">+ // Custom data.</span>
<span class="gi">+ uint32 cursor;</span>
<span class="gi">+};</span>
<span class="gi">+</span>
<span class="w"> </span>static TableScanDesc memam_beginscan(
<span class="w"> </span> Relation relation,
<span class="w"> </span> Snapshot snapshot,
<span class="gu">@@ -65,11 +72,13 @@</span>
<span class="w"> </span> ParallelTableScanDesc parallel_scan,
<span class="w"> </span> uint32 flags
<span class="w"> </span>) {
<span class="gd">- TableScanDescData* scan = {};</span>
<span class="gd">- DEBUG_FUNC();</span>
<span class="gi">+ struct MemScanDesc* scan;</span>
<span class="gd">- scan = (TableScanDescData*)malloc(sizeof(TableScanDescData));</span>
<span class="gd">- scan->rs_rd = relation;</span>
<span class="gi">+ DEBUG_FUNC();</span>
<span class="gi">+</span>
<span class="gi">+ scan = (struct MemScanDesc*)malloc(sizeof(struct MemScanDesc));</span>
<span class="gi">+ scan->rs_base.rs_rd = relation;</span>
<span class="gi">+ scan->cursor = 0;</span>
<span class="w"> </span> return (TableScanDesc)scan;
<span class="w"> </span>}
<span class="gu">@@ -89,23 +97,26 @@</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="w"> </span>}
<span class="gd">-static bool done = false;</span>
<span class="w"> </span>static bool memam_getnextslot(
<span class="w"> </span> TableScanDesc sscan,
<span class="w"> </span> ScanDirection direction,
<span class="w"> </span> TupleTableSlot *slot
<span class="w"> </span>) {
<span class="gi">+ struct MemScanDesc* mscan = (struct MemScanDesc*)sscan;</span>
<span class="gi">+ struct Table* table = NULL;</span>
<span class="w"> </span> DEBUG_FUNC();
<span class="gd">- if (done) {</span>
<span class="gi">+ ExecClearTuple(slot);</span>
<span class="gi">+</span>
<span class="gi">+ get_table(&table, mscan->rs_base.rs_rd);</span>
<span class="gi">+ if (table == NULL || mscan->cursor == table->nrows) {</span>
<span class="w"> </span> return false;
<span class="w"> </span> }
<span class="gd">- slot->tts_values[0] = Int32GetDatum(314 /* Some unique-looking value */);</span>
<span class="gi">+ slot->tts_values[0] = Int32GetDatum(table->rows[mscan->cursor].columns[0].value);</span>
<span class="w"> </span> slot->tts_isnull[0] = false;
<span class="w"> </span> ExecStoreVirtualTuple(slot);
<span class="gd">- done = true;</span>
<span class="gd">-</span>
<span class="gi">+ mscan->cursor++;</span>
<span class="w"> </span> return true;
<span class="w"> </span>}
</pre></div>
<p>Let's try it out.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install
<span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">INSERT 0 2</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 23</span>
<span class="go"> 101</span>
<span class="gp gp-VirtualEnv">(2 rows)</span>
</pre></div>
<p>And there we have it. :)</p>
<h3 id="awesome-sql-power">Awesome SQL power</h3><p>So we tried one table and we tried a <code>SELECT</code> without anything else.</p>
<p>What happens if we use more of SQL? Let's create another table
and try some more complex queries. Edit <code>test.sql</code>:</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">pgtam</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pgtam</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">x</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">y</span><span class="p">(</span><span class="n">b</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span>
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">23</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">101</span><span class="p">);</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="p">;</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">23</span><span class="p">;</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span>
<span class="k">SELECT</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">y</span><span class="p">;</span>
</pre></div>
<p>Run it:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql
<span class="go">psql:test.sql:1: NOTICE: drop cascades to 2 other objects</span>
<span class="go">DETAIL: drop cascades to table x</span>
<span class="go">drop cascades to table y</span>
<span class="go">DROP EXTENSION</span>
<span class="go">CREATE EXTENSION</span>
<span class="go">CREATE TABLE</span>
<span class="go">CREATE TABLE</span>
<span class="go">INSERT 0 2</span>
<span class="go"> a</span>
<span class="go">-----</span>
<span class="go"> 23</span>
<span class="go"> 101</span>
<span class="gp gp-VirtualEnv">(2 rows)</span>
<span class="go"> ?column?</span>
<span class="go">----------</span>
<span class="go"> 123</span>
<span class="gp gp-VirtualEnv">(1 row)</span>
<span class="go"> a | count</span>
<span class="go">-----+-------</span>
<span class="go"> 23 | 1</span>
<span class="go"> 101 | 1</span>
<span class="gp gp-VirtualEnv">(2 rows)</span>
<span class="go"> b</span>
<span class="go">---</span>
<span class="gp gp-VirtualEnv">(0 rows)</span>
</pre></div>
<p>Pretty sweet!</p>
<h3 id="next-steps">Next steps</h3><p>It would be neat to build a storage engine that reads from and writes
to a CSV a la MySQL's CSV storage engine. Or a storage engine that
uses RocksDB.</p>
<p>It would also be good to figure out how indexes work, how deletions
work, how updates and DDL beyond <code>CREATE</code> works.</p>
<p>And I should probably contribute some of this to the <a href="https://www.postgresql.org/docs/current/tableam.html">table access
method</a> docs
which are pretty sparse at the moment.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I've been working this week to understand Postgres Table Access Methods for alternative storage engines.<br><br>Especially challenging because the documentation is pretty sparse and few minimal implementations exist.<br><br>I wrote up my approach!<a href="https://t.co/LQGglRkev5">https://t.co/LQGglRkev5</a> <a href="https://t.co/v0MeOu4Hbr">pic.twitter.com/v0MeOu4Hbr</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1719873793693221157?ref_src=twsrc%5Etfw">November 2, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.htmlWed, 01 Nov 2023 00:00:00 +0000
- io_uring basics: Writing a file to diskhttp://notes.eatonphil.com/2023-10-19-write-file-to-disk-with-io_uring.html<p>King and I <a href="https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue/">wrote a blog
post</a>
about building an event-driven cross-platform IO library that used
io_uring on Linux. We sketched out how it works at a high level but I
hadn't yet internalized how you actually code with io_uring. So I
strapped myself down this week and wrote <a href="https://github.com/eatonphil/io-playground">some
benchmarks</a> to build my
intuition about io_uring and other IO models.</p>
<p>I started with implementations in Go and ported them to Zig to make
sure I had done the Go versions decently. And I got some help from
King and other internetters to find some inefficiencies in my code.</p>
<p>This post will walk through my process, getting increasingly efficient
(and a little increasingly complex) ways to write an entire file to
disk with io_uring, from Go and Zig.</p>
<p>Notably, we're not going to <code>fsync()</code> and we're not going to use
<code>O_DIRECT</code>. So we won't be testing the entire IO pipeline from
userland to disk hardware but just how fast IO gets to the kernel. The
focus of this post is more on IO methods and using io_uring, not
absolute numbers.</p>
<p>All code for this post is <a href="https://github.com/eatonphil/io_uring-basics-writing-file">available on
GitHub</a>.</p>
<p class="note">
This code is going to indirectly show some differences in timing
between Go and Zig. I could care less about benchmarketing. And I
hope something about Zig vs Go is not what you take away from this
post either.
<br />
<br />
The goal is to build an intuition and be generally
correct. Observing the same relative behavior between
implementations across two languages helps me gain confidence what
I'm doing is correct.
</p><h3 id="io_uring">io_uring</h3><p>With normal blocking syscalls you just call <code>read()</code> or <code>write()</code> and
wait for the results. io_uring is one of Linux's more powerful
<em>asynchronous</em> IO offerings. Unlike epoll, you can use io_uring with
both files and network connections. And unlike epoll you can even have
the syscall run in the kernel.</p>
<p>To interact with io_uring, you register a submission queue for syscalls
and their arguments. And you register a completion queue for syscall
results.</p>
<p>You can batch many syscalls in one single call to io_uring,
effectively turning up to N (4096 at most) syscalls into just one
syscall. The kernel still does all the work of the N syscalls but you
avoid some overhead.</p>
<p>As you check the completion queue and handle completed submissions,
the submission queue is also freed all or somewhat, and you can now
add more submissions.</p>
<p>For a more complete understanding, check out the kernel document
<a href="https://kernel.dk/io_uring.pdf">Efficient IO with io_uring</a>.</p>
<h3 id="io_uring-vs-liburing">io_uring vs liburing</h3><p>io_uring is a complex, low-level interface. Shuveb Hussain has <a href="https://unixism.net/2020/04/io-uring-by-example-part-1-introduction/">an
excellent
series</a>
on programming io_uring. But that was too low-level for me as I was
trying to figure out how to just get something working.</p>
<p>Instead, most people use <a href="https://github.com/axboe/liburing">liburing</a>
or a ported version of it like <a href="https://github.com/ziglang/zig/blob/master/lib/std/os/linux/io_uring.zig">the Zig standard library's
io_uring.zig</a>
or <a href="https://github.com/Iceber/iouring-go">Iceber's iouring-go</a>.</p>
<p>io_uring started clicking for me when I tried out the iouring-go
library. So we'll start there.</p>
<h3 id="boilerplate">Boilerplate</h3><p>First off, let's set up some boilerplate for the Go and Zig code.</p>
<p>In main.go add:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"time"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"assert"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">4096</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">readNBytes</span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">fn</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">buffer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="nx">buffer</span><span class="p">[:</span><span class="nx">read</span><span class="p">]</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">data</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">benchmark</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="o">*</span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span><span class="s">"out.bin"</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_RDWR</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_TRUNC</span><span class="p">,</span><span class="w"> </span><span class="mo">0755</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">t1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
<span class="w"> </span><span class="nx">fn</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Sub</span><span class="p">(</span><span class="nx">t1</span><span class="p">).</span><span class="nx">Seconds</span><span class="p">()</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">",%f,%f\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">))</span><span class="o">/</span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Equal</span><span class="p">(</span><span class="nx">readNBytes</span><span class="p">(</span><span class="s">"out.bin"</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)),</span><span class="w"> </span><span class="nx">data</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p>And in main.zig add:</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">OUT_FILE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"out.bin"</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="o">:</span><span class="w"> </span><span class="kt">u64</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">4096</span><span class="p">;</span>
<span class="k">fn</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">filename</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">n</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFile</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">written</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">nwritten</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span>
<span class="w"> </span><span class="nb">@memcpy</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">written</span><span class="p">..],</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">nwritten</span><span class="p">]);</span>
<span class="w"> </span><span class="n">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">nwritten</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">n</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">data</span><span class="p">;</span>
<span class="p">}</span>
<span class="kr">const</span><span class="w"> </span><span class="n">Benchmark</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">t</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">Timer</span><span class="p">,</span>
<span class="w"> </span><span class="n">file</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">File</span><span class="p">,</span>
<span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">Benchmark</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">getStdOut</span><span class="p">().</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">"{s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">name</span><span class="p">});</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">createFile</span><span class="p">(</span><span class="n">OUT_FILE</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">truncate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">Timer</span><span class="p">.</span><span class="n">start</span><span class="p">(),</span>
<span class="w"> </span><span class="p">.</span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">file</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="n">b</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Benchmark</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">f64</span><span class="p">,</span><span class="w"> </span><span class="nb">@floatFromInt</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">read</span><span class="p">()))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">ns_per_s</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">getStdOut</span><span class="p">().</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span>
<span class="w"> </span><span class="s">",{d},{d}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">f64</span><span class="p">,</span><span class="w"> </span><span class="nb">@floatFromInt</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">OUT_FILE</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">in</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">));</span>
<span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">in</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<h3 id="keep-it-simple:-write()">Keep it simple: write()</h3><p>Now let's add the naive version of writing bytes to disk: calling
<code>write()</code> repeatedly until all data has been written to disk.</p>
<p>In <code>main.go</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">size</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">104857600</span><span class="w"> </span><span class="c1">// 100MiB</span>
<span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readNBytes</span><span class="p">(</span><span class="s">"/dev/random"</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="p">)</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">RUNS</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">10</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">RUNS</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">benchmark</span><span class="p">(</span><span class="s">"blocking"</span><span class="p">,</span><span class="w"> </span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">size</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="o">-</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">data</span><span class="p">[</span><span class="nx">i</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="nx">i</span><span class="o">+</span><span class="nx">size</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And in <code>main.zig</code>:</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">;</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">SIZE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">104857600</span><span class="p">;</span><span class="w"> </span><span class="c1">// 100MiB</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">"/dev/random"</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">RUNS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">run</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">RUNS</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">"blocking"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">]);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">size</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Let's build and run these programs and store the results to CSV we
can analyze with DuckDB.</p>
<p>Go first:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>main.go<span class="w"> </span>-o<span class="w"> </span>gomain
$<span class="w"> </span>./gomain<span class="w"> </span>><span class="w"> </span>go.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'go.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>blocking</td>
<td>0.07251540000000001s</td>
<td>1.4GB/s</td>
</tr>
</tbody>
</table>
<p>And Zig:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>><span class="w"> </span>zig.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'zig.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>blocking</td>
<td>0.0656907669s</td>
<td>1.5GB/s</td>
</tr>
</tbody>
</table>
<p>Alright, we've got a baseline now and both language implementations
are in the same ballpark.</p>
<p>Let's add a simple io_uring version!</p>
<h3 id="io_uring,-1-entry,-go">io_uring, 1 entry, Go</h3><p>The <a href="https://github.com/Iceber/iouring-go#quickstart">iouring-go</a>
library has really excellent documentation for getting started.</p>
<p>To keep it simple, we'll use io_uring with only 1 entry. Add the
following to <code>func main()</code> after the existing <code>benchmark()</code> call in
<code>main.go</code>:</p>
<div class="highlight"><pre><span></span><span class="n">benchmark</span><span class="p">(</span><span class="s">"io_uring"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">func</span><span class="p">(</span><span class="n">f</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">os</span><span class="p">.</span><span class="n">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">iour</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">iouring</span><span class="p">.</span><span class="n">New</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">defer</span><span class="w"> </span><span class="n">iour</span><span class="p">.</span><span class="n">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">data</span><span class="p">);</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nl">size</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="o">-</span><span class="n">i</span><span class="p">)</span>
<span class="w"> </span><span class="nl">prepRequest</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">iouring</span><span class="p">.</span><span class="n">Pwrite</span><span class="p">(</span><span class="kt">int</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">Fd</span><span class="p">()),</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">i</span><span class="o">+</span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">uint64</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="w"> </span><span class="n">res</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">iour</span><span class="p">.</span><span class="n">SubmitRequest</span><span class="p">(</span><span class="n">prepRequest</span><span class="p">,</span><span class="w"> </span><span class="nb">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o"><-</span><span class="n">res</span><span class="p">.</span><span class="n">Done</span><span class="p">()</span>
<span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">ReturnInt</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">i</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">})</span>
</pre></div>
<p>Note that <code>benchmark</code> takes care of <code>f.Seek(0)</code> before each run. And
it also validates that the file contents are equivalent to the input
<code>data</code>. So it validates the benchmark for correctness.</p>
<p>Alright, let's run this new Go implementation with io_uring!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>gomain
$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>main.go<span class="w"> </span>-o<span class="w"> </span>gomain
$<span class="w"> </span>./gomain<span class="w"> </span>><span class="w"> </span>go.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'go.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>blocking</td>
<td>0.0811486s</td>
<td>1.3GB/s</td>
</tr>
<tr>
<td>io_uring</td>
<td>0.5083049999999999s</td>
<td>213.2MB/s</td>
</tr>
</tbody>
</table>
<p>Well that looks terrible.</p>
<p>Let's port it to Zig to see if we notice the same behavior there.</p>
<h3 id="io_uring,-1-entry,-zig">io_uring, 1 entry, Zig</h3><p>There isn't an official Zig tutorial on io_uring I'm aware of. But
<a href="https://github.com/ziglang/zig/blob/master/lib/std/os/linux/io_uring.zig">io_uring.zig</a>
is easy enough to browse through. And there are tests in that file
that also show how to use it.</p>
<p>And now that we've explored a bit in Go the basic gist should be
similar:</p>
<ul>
<li>initialize io_uring</li>
<li>submit an entry</li>
<li>wait for it to finish</li>
<li>move on</li>
</ul>
<p>Add the following to <code>fn main()</code> after the existing benchmark block in
<code>main.zig</code>:</p>
<div class="highlight"><pre><span></span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">"iouring"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ring</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">IO_Uring</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">entries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">submitted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">submit_and_wait</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">submitted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cqe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">copy_cqe</span><span class="p">();</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">err</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">SUCCESS</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="nb">@intCast</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="p">));</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Now build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>><span class="w"> </span>zig.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'zig.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>blocking</td>
<td>0.06650093630000001s</td>
<td>1.5GB/s</td>
</tr>
<tr>
<td>io_uring</td>
<td>0.17542890139999998s</td>
<td>597.7MB/s</td>
</tr>
</tbody>
</table>
<p>Well it's similarly pretty bad. But our implementation ignores one
major aspect of io_uring: batching requests.</p>
<p>Let's do some refactoring!</p>
<h3 id="io_uring,-n-entries,-go">io_uring, N entries, Go</h3><p>To support submitting N entries, we're going to have an inner loop
running up to N that fills up N entries to io_uring.</p>
<p>Then we'll wait for the N submissions to complete and check their
results.</p>
<p>We'll keep going until we write the entire file.</p>
<p>All of this can stay inside the loop in <code>main</code>, I'm just dropping
preceding whitespace for nicer formatting here:</p>
<div class="highlight"><pre><span></span><span class="nx">benchmarkIOUringNEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">nEntries</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">benchmark</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"io_uring_%d_entries"</span><span class="p">,</span><span class="w"> </span><span class="nx">nEntries</span><span class="p">),</span><span class="w"> </span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">iour</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iouring</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">nEntries</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iour</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">requests</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="nx">iouring</span><span class="p">.</span><span class="nx">PrepRequest</span><span class="p">,</span><span class="w"> </span><span class="nx">nEntries</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">nEntries</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">submittedEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">nEntries</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">base</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">base</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">submittedEntries</span><span class="o">++</span>
<span class="w"> </span><span class="nx">size</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="o">-</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="nx">requests</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">iouring</span><span class="p">.</span><span class="nx">Pwrite</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="nx">f</span><span class="p">.</span><span class="nx">Fd</span><span class="p">()),</span><span class="w"> </span><span class="nx">data</span><span class="p">[</span><span class="nx">base</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="nx">base</span><span class="o">+</span><span class="nx">size</span><span class="p">],</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">base</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">submittedEntries</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iour</span><span class="p">.</span><span class="nx">SubmitRequests</span><span class="p">(</span><span class="nx">requests</span><span class="p">[:</span><span class="nx">submittedEntries</span><span class="p">],</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o"><-</span><span class="nx">res</span><span class="p">.</span><span class="nx">Done</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">res</span><span class="p">.</span><span class="nx">ErrResults</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">result</span><span class="p">.</span><span class="nx">ReturnInt</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">})</span>
<span class="p">}</span>
<span class="nx">benchmarkIOUringNEntries</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="nx">benchmarkIOUringNEntries</span><span class="p">(</span><span class="mi">128</span><span class="p">)</span>
</pre></div>
<p>There are some specific things in there to notice.</p>
<p>First, toward the end of the file we may not have <code>N</code> entries to
submit. We may have <code>1</code> or even <code>0</code>.</p>
<p>If we have <code>0</code> to submit, we need to not even submit anything
otherwise the Go library hangs. Similarly, if we don't slice
<code>requests</code> to <code>requests[:submittedEntries]</code>, the Go library will
segfault if <code>submittedEntries < N</code>.</p>
<p>Other than that, let's build and run this!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>gomain
$<span class="w"> </span>./gomain<span class="w"> </span>><span class="w"> </span>go.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'go.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>blocking</td>
<td>0.0740368s</td>
<td>1.4GB/s</td>
</tr>
<tr>
<td>io_uring_128_entries</td>
<td>0.127519s</td>
<td>836.6MB/s</td>
</tr>
<tr>
<td>io_uring_1_entries</td>
<td>0.46831579999999995s</td>
<td>226.9MB/s</td>
</tr>
</tbody>
</table>
<p>Now we're getting somewhere! Still half the throughput but a 4x
improvement from using only a single entry.</p>
<p>Let's port the N entry code to Zig.</p>
<h3 id="io_uring,-n-entries,-zig">io_uring, N entries, Zig</h3><p>Unlike Go we can't do closures, so we'll have to make
<code>benchmarkIOUringNEntries</code> a top-level function and keep the calls to
it in the loop in <code>main</code>:</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">;</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">SIZE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">104857600</span><span class="p">;</span><span class="w"> </span><span class="c1">// 100MiB</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">"/dev/random"</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">RUNS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">run</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">RUNS</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">"blocking"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">]);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">size</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="mi">128</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And for the implementation itself, the only two big differences from
the first version are that we'll bulk-read completion events (<code>cqe</code>s)
and that we'll create and wait for many submissions at once.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">nEntries</span><span class="o">:</span><span class="w"> </span><span class="n">u13</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">allocator</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="s">"iouring_{}_entries"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">nEntries</span><span class="p">});</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ring</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">IO_Uring</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">nEntries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cqes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">io_uring_cqe</span><span class="p">,</span><span class="w"> </span><span class="n">nEntries</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">cqes</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nEntries</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">submittedEntries</span><span class="o">:</span><span class="w"> </span><span class="kt">u32</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">j</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">nEntries</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">base</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">submittedEntries</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">base</span><span class="p">);</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">base</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">base</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">submitted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">submit_and_wait</span><span class="p">(</span><span class="n">submittedEntries</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">submitted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">submittedEntries</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">waited</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">copy_cqes</span><span class="p">(</span><span class="n">cqes</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">submitted</span><span class="p">],</span><span class="w"> </span><span class="n">submitted</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">waited</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">submitted</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cqes</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">submitted</span><span class="p">])</span><span class="w"> </span><span class="o">|*</span><span class="n">cqe</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">err</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">SUCCESS</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="nb">@intCast</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="p">));</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Let's build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>><span class="w"> </span>zig.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'zig.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>blocking</td>
<td>0.0674331114s</td>
<td>1.5GB/s</td>
</tr>
<tr>
<td>iouring_128_entries</td>
<td>0.06773539590000001s</td>
<td>1.5GB/s</td>
</tr>
<tr>
<td>iouring_1_entries</td>
<td>0.1855542556s</td>
<td>569.9MB/s</td>
</tr>
</tbody>
</table>
<p>Huh, that's surprising! We caught up to blocking writes with io_uring
in Zig, but not in Go, even though we made good progress in Go.</p>
<h3 id="ring-buffers">Ring buffers</h3><p>But we can do a bit better. We're doing batching, but the API is
called "io_uring" not "io_batch". We're not even making use of the
ring buffer behavior io_uring gives us!</p>
<p>We are waiting for all submitted results complete. But there's no
reason to do that. Instead we should submit as much as we can. But we
should not block waiting for completions. We should handle completions
when they happen. And we should retry submissions until we're done
reading. Retrying if there's no space for the moment.</p>
<p>Unfortunately the Go library doesn't seem to expose this ring behavior
of io_uring. Or I've missed it.</p>
<p>But we can do it in Zig. Let's go.</p>
<h3 id="io_uring,-ring-buffer,-zig">io_uring, ring buffer, Zig</h3><p>We need to change the way we track which offsets we need to submit so
far. We also need to keep the loop going until we are sure we have
<em>written</em> all data. And we need to stop blocking on the number we
submitted; never blocking at all.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">nEntries</span><span class="o">:</span><span class="w"> </span><span class="n">u13</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">allocator</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="s">"iouring_{}_entries"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">nEntries</span><span class="p">});</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ring</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">IO_Uring</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">nEntries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cqes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">io_uring_cqe</span><span class="p">,</span><span class="w"> </span><span class="n">nEntries</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">cqes</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">written</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">or</span><span class="w"> </span><span class="n">written</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">submittedEntries</span><span class="o">:</span><span class="w"> </span><span class="kt">u32</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">j</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">base</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">base</span><span class="p">);</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">base</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">base</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">SubmissionQueueFull</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">break</span><span class="p">,</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">submittedEntries</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">size</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">submit_and_wait</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cqesDone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">copy_cqes</span><span class="p">(</span><span class="n">cqes</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cqes</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">cqesDone</span><span class="p">])</span><span class="w"> </span><span class="o">|*</span><span class="n">cqe</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">err</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">SUCCESS</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="nb">@intCast</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="p">));</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span>
<span class="w"> </span><span class="n">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>The code got a bit simpler! Granted, we're omitting error handling.</p>
<p>Build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>><span class="w"> </span>zig.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'zig.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>iouring_128_entries</td>
<td>0.06035423609999999s</td>
<td>1.7GB/s</td>
</tr>
<tr>
<td>iouring_1_entries</td>
<td>0.0610197624s</td>
<td>1.7GB/s</td>
</tr>
<tr>
<td>blocking</td>
<td>0.0671628515s</td>
<td>1.5GB/s</td>
</tr>
</tbody>
</table>
<p>Not bad!</p>
<h3 id="crank-it-up">Crank it up</h3><p>We've been inserting 100MiB of data. Let's go up to 1GiB to see how
that affects things. Ideally the more data we write the more we
reflect realistic long-term results.</p>
<p>In <code>main.zig</code> just change <code>SIZE</code> to <code>1073741824</code>. Rebuild and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>><span class="w"> </span>zig.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'out.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>iouring_128_entries</td>
<td>0.6063814535s</td>
<td>1.7GB/s</td>
</tr>
<tr>
<td>iouring_1_entries</td>
<td>0.6167537295000001s</td>
<td>1.7GB/s</td>
</tr>
<tr>
<td>blocking</td>
<td>0.6831747749s</td>
<td>1.5GB/s</td>
</tr>
</tbody>
</table>
<p>No real difference, perfect!</p>
<p>Let's make one more change though. Let's up the <code>BUFFER_SIZE</code> from
4KiB to 1MiB.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>><span class="w"> </span>zig.csv
$<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">"select column0 as method, avg(cast(column1 as double)) || 's' avg_time, format_bytes(avg(column2::double)::bigint) || '/s' as avg_throughput from 'out.csv' group by column0 order by avg(cast(column1 as double)) asc"</span>
</pre></div>
<table>
<thead><tr>
<th>method</th>
<th>avg_time</th>
<th>avg_throughput</th>
</tr>
</thead>
<tbody>
<tr>
<td>iouring_128_entries</td>
<td>0.2756831357s</td>
<td>3.8GB/s</td>
</tr>
<tr>
<td>iouring_1_entries</td>
<td>0.27575404880000004s</td>
<td>3.8GB/s</td>
</tr>
<tr>
<td>blocking</td>
<td>0.2833337046s</td>
<td>3.7GB/s</td>
</tr>
</tbody>
</table>
<p>Hey that's an improvement!</p>
<h3 id="control">Control</h3><p>All these numbers are machine-specific obviously. So what does an
existing tool like
<a href="https://fio.readthedocs.io/en/latest/fio_doc.html">fio</a> say?
(Assuming I'm using it correctly. I await your corrections!)</p>
<p>With a 4KiB buffer size:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>fio<span class="w"> </span>--name<span class="o">=</span>fiotest<span class="w"> </span>--rw<span class="o">=</span>write<span class="w"> </span>--size<span class="o">=</span>1G<span class="w"> </span>--bs<span class="o">=</span>4k<span class="w"> </span>--group_reporting<span class="w"> </span>--ioengine<span class="o">=</span>sync
fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">g</span><span class="o">=</span><span class="m">0</span><span class="o">)</span>:<span class="w"> </span><span class="nv">rw</span><span class="o">=</span>write,<span class="w"> </span><span class="nv">bs</span><span class="o">=(</span>R<span class="o">)</span><span class="w"> </span>4096B-4096B,<span class="w"> </span><span class="o">(</span>W<span class="o">)</span><span class="w"> </span>4096B-4096B,<span class="w"> </span><span class="o">(</span>T<span class="o">)</span><span class="w"> </span>4096B-4096B,<span class="w"> </span><span class="nv">ioengine</span><span class="o">=</span>sync,<span class="w"> </span><span class="nv">iodepth</span><span class="o">=</span><span class="m">1</span>
fio-3.33
Starting<span class="w"> </span><span class="m">1</span><span class="w"> </span>process
Jobs:<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">(</span><span class="nv">f</span><span class="o">=</span><span class="m">1</span><span class="o">)</span>
fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">groupid</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">jobs</span><span class="o">=</span><span class="m">1</span><span class="o">)</span>:<span class="w"> </span><span class="nv">err</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>:<span class="w"> </span><span class="nv">pid</span><span class="o">=</span><span class="m">2437359</span>:<span class="w"> </span>Thu<span class="w"> </span>Oct<span class="w"> </span><span class="m">19</span><span class="w"> </span><span class="m">23</span>:33:42<span class="w"> </span><span class="m">2023</span>
<span class="w"> </span>write:<span class="w"> </span><span class="nv">IOPS</span><span class="o">=</span>282k,<span class="w"> </span><span class="nv">BW</span><span class="o">=</span>1102MiB/s<span class="w"> </span><span class="o">(</span>1156MB/s<span class="o">)(</span>1024MiB/929msec<span class="o">)</span><span class="p">;</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>zone<span class="w"> </span>resets
<span class="w"> </span>clat<span class="w"> </span><span class="o">(</span>nsec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">2349</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">54099</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">2709</span>.48,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">1325</span>.83
<span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>nsec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">2390</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">54139</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">2752</span>.89,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">1334</span>.62
<span class="w"> </span>clat<span class="w"> </span>percentiles<span class="w"> </span><span class="o">(</span>nsec<span class="o">)</span>:
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2416</span><span class="o">]</span>,<span class="w"> </span><span class="m">5</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2416</span><span class="o">]</span>,<span class="w"> </span><span class="m">10</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2416</span><span class="o">]</span>,<span class="w"> </span><span class="m">20</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">30</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,<span class="w"> </span><span class="m">40</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,<span class="w"> </span><span class="m">50</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,<span class="w"> </span><span class="m">60</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2480</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">70</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2512</span><span class="o">]</span>,<span class="w"> </span><span class="m">80</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2544</span><span class="o">]</span>,<span class="w"> </span><span class="m">90</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2832</span><span class="o">]</span>,<span class="w"> </span><span class="m">95</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">3504</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">5792</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.50th<span class="o">=[</span><span class="m">15296</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.90th<span class="o">=[</span><span class="m">19584</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.95th<span class="o">=[</span><span class="m">20096</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.99th<span class="o">=[</span><span class="m">22656</span><span class="o">]</span>
<span class="w"> </span>bw<span class="w"> </span><span class="o">(</span><span class="w"> </span>KiB/s<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">940856</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">940856</span>,<span class="w"> </span><span class="nv">per</span><span class="o">=</span><span class="m">83</span>.36%,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">940856</span>.00,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>.00,<span class="w"> </span><span class="nv">samples</span><span class="o">=</span><span class="m">1</span>
<span class="w"> </span>iops<span class="w"> </span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">235214</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">235214</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">235214</span>.00,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>.00,<span class="w"> </span><span class="nv">samples</span><span class="o">=</span><span class="m">1</span>
<span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span><span class="w"> </span>:<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">97</span>.22%,<span class="w"> </span><span class="nv">10</span><span class="o">=</span><span class="m">2</span>.03%,<span class="w"> </span><span class="nv">20</span><span class="o">=</span><span class="m">0</span>.71%,<span class="w"> </span><span class="nv">50</span><span class="o">=</span><span class="m">0</span>.04%,<span class="w"> </span><span class="nv">100</span><span class="o">=</span><span class="m">0</span>.01%
<span class="w"> </span>cpu<span class="w"> </span>:<span class="w"> </span><span class="nv">usr</span><span class="o">=</span><span class="m">17</span>.35%,<span class="w"> </span><span class="nv">sys</span><span class="o">=</span><span class="m">82</span>.11%,<span class="w"> </span><span class="nv">ctx</span><span class="o">=</span><span class="m">26</span>,<span class="w"> </span><span class="nv">majf</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">minf</span><span class="o">=</span><span class="m">11</span>
<span class="w"> </span>IO<span class="w"> </span>depths<span class="w"> </span>:<span class="w"> </span><span class="nv">1</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">2</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>><span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%
<span class="w"> </span>submit<span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>><span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%
<span class="w"> </span><span class="nb">complete</span><span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>><span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%
<span class="w"> </span>issued<span class="w"> </span>rwts:<span class="w"> </span><span class="nv">total</span><span class="o">=</span><span class="m">0</span>,262144,0,0<span class="w"> </span><span class="nv">short</span><span class="o">=</span><span class="m">0</span>,0,0,0<span class="w"> </span><span class="nv">dropped</span><span class="o">=</span><span class="m">0</span>,0,0,0
<span class="w"> </span>latency<span class="w"> </span>:<span class="w"> </span><span class="nv">target</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">window</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">percentile</span><span class="o">=</span><span class="m">100</span>.00%,<span class="w"> </span><span class="nv">depth</span><span class="o">=</span><span class="m">1</span>
Run<span class="w"> </span>status<span class="w"> </span>group<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">(</span>all<span class="w"> </span><span class="nb">jobs</span><span class="o">)</span>:
<span class="w"> </span>WRITE:<span class="w"> </span><span class="nv">bw</span><span class="o">=</span>1102MiB/s<span class="w"> </span><span class="o">(</span>1156MB/s<span class="o">)</span>,<span class="w"> </span>1102MiB/s-1102MiB/s<span class="w"> </span><span class="o">(</span>1156MB/s-1156MB/s<span class="o">)</span>,<span class="w"> </span><span class="nv">io</span><span class="o">=</span>1024MiB<span class="w"> </span><span class="o">(</span>1074MB<span class="o">)</span>,<span class="w"> </span><span class="nv">run</span><span class="o">=</span><span class="m">929</span>-929msec
</pre></div>
<p>1.2GB/s is about in the ballpark of what we got.</p>
<p>And with a 1MiB buffer size?</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>fio<span class="w"> </span>--name<span class="o">=</span>fiotest<span class="w"> </span>--rw<span class="o">=</span>write<span class="w"> </span>--size<span class="o">=</span>1G<span class="w"> </span>--bs<span class="o">=</span>1M<span class="w"> </span>--group_reporting<span class="w"> </span>--ioengine<span class="o">=</span>sync
fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">g</span><span class="o">=</span><span class="m">0</span><span class="o">)</span>:<span class="w"> </span><span class="nv">rw</span><span class="o">=</span>write,<span class="w"> </span><span class="nv">bs</span><span class="o">=(</span>R<span class="o">)</span><span class="w"> </span>1024KiB-1024KiB,<span class="w"> </span><span class="o">(</span>W<span class="o">)</span><span class="w"> </span>1024KiB-1024KiB,<span class="w"> </span><span class="o">(</span>T<span class="o">)</span><span class="w"> </span>1024KiB-1024KiB,<span class="w"> </span><span class="nv">ioengine</span><span class="o">=</span>sync,<span class="w"> </span><span class="nv">iodepth</span><span class="o">=</span><span class="m">1</span>
fio-3.33
Starting<span class="w"> </span><span class="m">1</span><span class="w"> </span>process
fiotest:<span class="w"> </span>Laying<span class="w"> </span>out<span class="w"> </span>IO<span class="w"> </span>file<span class="w"> </span><span class="o">(</span><span class="m">1</span><span class="w"> </span>file<span class="w"> </span>/<span class="w"> </span>1024MiB<span class="o">)</span>
fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">groupid</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">jobs</span><span class="o">=</span><span class="m">1</span><span class="o">)</span>:<span class="w"> </span><span class="nv">err</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>:<span class="w"> </span><span class="nv">pid</span><span class="o">=</span><span class="m">2437239</span>:<span class="w"> </span>Thu<span class="w"> </span>Oct<span class="w"> </span><span class="m">19</span><span class="w"> </span><span class="m">23</span>:32:09<span class="w"> </span><span class="m">2023</span>
<span class="w"> </span>write:<span class="w"> </span><span class="nv">IOPS</span><span class="o">=</span><span class="m">3953</span>,<span class="w"> </span><span class="nv">BW</span><span class="o">=</span>3954MiB/s<span class="w"> </span><span class="o">(</span>4146MB/s<span class="o">)(</span>1024MiB/259msec<span class="o">)</span><span class="p">;</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>zone<span class="w"> </span>resets
<span class="w"> </span>clat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">221</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">1205</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">241</span>.83,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">43</span>.93
<span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">228</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">1250</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">251</span>.68,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">45</span>.80
<span class="w"> </span>clat<span class="w"> </span>percentiles<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span>:
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">225</span><span class="o">]</span>,<span class="w"> </span><span class="m">5</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">225</span><span class="o">]</span>,<span class="w"> </span><span class="m">10</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">227</span><span class="o">]</span>,<span class="w"> </span><span class="m">20</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">227</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">30</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">231</span><span class="o">]</span>,<span class="w"> </span><span class="m">40</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">233</span><span class="o">]</span>,<span class="w"> </span><span class="m">50</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">235</span><span class="o">]</span>,<span class="w"> </span><span class="m">60</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">239</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">70</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">243</span><span class="o">]</span>,<span class="w"> </span><span class="m">80</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">249</span><span class="o">]</span>,<span class="w"> </span><span class="m">90</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">262</span><span class="o">]</span>,<span class="w"> </span><span class="m">95</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">269</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">302</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.50th<span class="o">=[</span><span class="w"> </span><span class="m">318</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.90th<span class="o">=[</span><span class="w"> </span><span class="m">1074</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.95th<span class="o">=[</span><span class="w"> </span><span class="m">1205</span><span class="o">]</span>,
<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.99th<span class="o">=[</span><span class="w"> </span><span class="m">1205</span><span class="o">]</span>
<span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span><span class="w"> </span>:<span class="w"> </span><span class="nv">250</span><span class="o">=</span><span class="m">80</span>.96%,<span class="w"> </span><span class="nv">500</span><span class="o">=</span><span class="m">18</span>.85%
<span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>msec<span class="o">)</span><span class="w"> </span>:<span class="w"> </span><span class="nv">2</span><span class="o">=</span><span class="m">0</span>.20%
<span class="w"> </span>cpu<span class="w"> </span>:<span class="w"> </span><span class="nv">usr</span><span class="o">=</span><span class="m">4</span>.26%,<span class="w"> </span><span class="nv">sys</span><span class="o">=</span><span class="m">94</span>.96%,<span class="w"> </span><span class="nv">ctx</span><span class="o">=</span><span class="m">3</span>,<span class="w"> </span><span class="nv">majf</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">minf</span><span class="o">=</span><span class="m">10</span>
<span class="w"> </span>IO<span class="w"> </span>depths<span class="w"> </span>:<span class="w"> </span><span class="nv">1</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">2</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>><span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%
<span class="w"> </span>submit<span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>><span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%
<span class="w"> </span><span class="nb">complete</span><span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>><span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%
<span class="w"> </span>issued<span class="w"> </span>rwts:<span class="w"> </span><span class="nv">total</span><span class="o">=</span><span class="m">0</span>,1024,0,0<span class="w"> </span><span class="nv">short</span><span class="o">=</span><span class="m">0</span>,0,0,0<span class="w"> </span><span class="nv">dropped</span><span class="o">=</span><span class="m">0</span>,0,0,0
<span class="w"> </span>latency<span class="w"> </span>:<span class="w"> </span><span class="nv">target</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">window</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">percentile</span><span class="o">=</span><span class="m">100</span>.00%,<span class="w"> </span><span class="nv">depth</span><span class="o">=</span><span class="m">1</span>
Run<span class="w"> </span>status<span class="w"> </span>group<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">(</span>all<span class="w"> </span><span class="nb">jobs</span><span class="o">)</span>:
<span class="w"> </span>WRITE:<span class="w"> </span><span class="nv">bw</span><span class="o">=</span>3954MiB/s<span class="w"> </span><span class="o">(</span>4146MB/s<span class="o">)</span>,<span class="w"> </span>3954MiB/s-3954MiB/s<span class="w"> </span><span class="o">(</span>4146MB/s-4146MB/s<span class="o">)</span>,<span class="w"> </span><span class="nv">io</span><span class="o">=</span>1024MiB<span class="w"> </span><span class="o">(</span>1074MB<span class="o">)</span>,<span class="w"> </span><span class="nv">run</span><span class="o">=</span><span class="m">259</span>-259msec
</pre></div>
<p>3.9GB/s is also roughly in the same ballpark we got.</p>
<p>Our code seems reasonable!</p>
<h3 id="what's-next?">What's next?</h3><p>None of this is original. <code>fio</code> is a similar tool, written in C, with
many different IO engines including <code>libaio</code> and <code>writev</code> support. And
it has many different IO workloads.</p>
<p>But it's been enjoyable to learn more about these APIs. How to program
them and how they compare to eachother.</p>
<p>So next steps could include adding additional IO engines or IO
workloads.</p>
<p>Also, either I need to understand Iceber's Go library better or its
API needs to be loosened up a little bit so we can get that awesome
ring buffer behavior we could use from Zig.</p>
<p>Keep an eye out here and on my <a href="https://github.com/eatonphil/io-playground">io-playground
repo</a>!</p>
<h3 id="selected-responses-after-publication">Selected responses after publication</h3><ul>
<li>wizeman on lobsters
<a href="https://lobste.rs/s/rimkv3/io_uring_basics_writing_file_disk#c_qvlx5u">suggests</a>
measuring at least 30 seconds worth of writing data and
<code>fsync()</code>-ing if you want to test the entire IO subsystem and not
just hitting the kernel cache.</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Digging into io_uring has been on my list for a long time now! This week I finally made made some progress.<br><br>Let's go on a little journey through a few increasingly complex (and useful) implementations of writing a file to disk with io_uring.<a href="https://t.co/gR9K2OQs2R">https://t.co/gR9K2OQs2R</a> <a href="https://t.co/TMaC8QYL6k">pic.twitter.com/TMaC8QYL6k</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1715151609615773965?ref_src=twsrc%5Etfw">October 19, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-10-19-write-file-to-disk-with-io_uring.htmlThu, 19 Oct 2023 00:00:00 +0000
- Go database driver overhead on insert-heavy workloadshttp://notes.eatonphil.com/2023-10-05-go-database-sql-overhead-on-insert-heavy-workloads.html<p>The most popular SQLite and PostgreSQL database drivers in Go are
(roughly) 20-76% slower than alternative Go drivers on insert-heavy
benchmarks of mine. So if you are bulk-inserting data with Go (and
potentially also bulk-retrieving data with Go), you may want to
consider the driver carefully. And you may want to consider avoiding
<code>database/sql</code>.</p>
<p>Some driver authors have
<a href="https://github.com/lib/pq/issues/771">noted</a> and
<a href="https://github.com/ClickHouse/clickhouse-go/tree/main#benchmark">benchmarked</a>
issues with
<a href="https://github.com/jackc/pgx#choosing-between-the-pgx-and-databasesql-interfaces">database/sql</a>.</p>
<p>So it may be the case that <code>database/sql</code> is responsible for some of
this overhead. And indeed the variations between drivers in this post
will be demonstrated by using <code>database/sql</code> and avoiding it. This post
won't specifically prove that the variation is due to the
<code>database/sql</code> interface. But that doesn't change the premise.</p>
<p class="note">
Not covered in this post but something to consider:
JetBrains <a href="https://blog.jetbrains.com/go/2023/04/27/comparing-db-packages/">has
suggested</a> that other frontends like sqlc, sqlx, and GORM do
worse than <code>database/sql</code>.
</p><p>This post is built on the workload, environment, libraries, and
methodology in my <a href="https://github.com/eatonphil/databases-intuition">databases-intuition repo on
GitHub</a>. See the
repo for details that will help you reproduce or correct me.</p>
<h3 id="insert-workload">INSERT workload</h3><p>In this workload, the data is random and there are no indexes. Neither
of these aspects matter for this post though because we're comparing
behavior within the same database among different drivers. This was
just a workload I already had.</p>
<p>Two different data sizes are tested:</p>
<ol>
<li>10M rows with 16 columns, each column is 32 bytes</li>
<li>10M rows with 3 columns, each column is 8 bytes</li>
</ol>
<p>Each test is run 10 times and we record median, standard deviation,
min, max and throughput.</p>
<h3 id="sqlite">SQLite</h3><p>Both variations presented here load 10M rows using a single prepared
statement called for each row within a single transaction.</p>
<p>The most popular driver is
<a href="https://github.com/mattn/go-sqlite3">mattn/go-sqlite3</a>.</p>
<p>It is roughly 20-40% slower than another driver that avoids
<code>database/sql</code>.</p>
<p>10M Rows, 16 columns, each column 32 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">56</span>.53<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.26s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">55</span>.05s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">59</span>.62s
Throughput:<span class="w"> </span><span class="m">176</span>,893.65<span class="w"> </span>±<span class="w"> </span><span class="m">3</span>,853.90<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">167</span>,719.97<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">181</span>,646.02<span class="w"> </span>rows/s
</pre></div>
<p>10M Rows, 3 columns, each column 8 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">15</span>.92<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.25s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">15</span>.69s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">16</span>.67s
Throughput:<span class="w"> </span><span class="m">628</span>,044.37<span class="w"> </span>±<span class="w"> </span><span class="m">9</span>,703.92<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">599</span>,852.91<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">637</span>,435.60<span class="w"> </span>rows/s
</pre></div>
<p>The other driver I tested is my own fork of
<a href="https://github.com/bvinc/go-sqlite-lite">bvinc/go-sqlite-lite</a> called
<a href="https://github.com/eatonphil/gosqlite">eatonphil/gosqlite</a>. I forked
it because it is unmaintained and I wanted to bring it up-to-date for
tests like this.</p>
<p>10M Rows, 16 columns, each column 32 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">45</span>.51<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.70s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">43</span>.72s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">45</span>.93s
Throughput:<span class="w"> </span><span class="m">219</span>,729.65<span class="w"> </span>±<span class="w"> </span><span class="m">3</span>,447.56<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">217</span>,742.98<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">228</span>,711.51<span class="w"> </span>rows/s
</pre></div>
<p>10M Rows, 3 columns, each column 8 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">10</span>.44<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.20s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">10</span>.02s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">10</span>.68s
Throughput:<span class="w"> </span><span class="m">957</span>,939.60<span class="w"> </span>±<span class="w"> </span><span class="m">18</span>,879.43<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">936</span>,114.60<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">998</span>,426.62<span class="w"> </span>rows/s
</pre></div>
<h3 id="postgresql">PostgreSQL</h3><p>Both variations presented use PostgreSQL's <a href="https://www.postgresql.org/docs/current/sql-copy.html"><code>COPY
FROM</code></a>
support. This is significantly faster for PostgreSQL than doing the
prepared statement we do in
SQLite. (<a href="https://github.com/eatonphil/databases-intuition#postgresql-prepared-insert">Here</a>
are my results for doing prepared statement INSERTs in PostgreSQL if
you are curious.)</p>
<p>The most popular PostgreSQL driver is
<a href="https://github.com/lib/pq">lib/pq</a>. The <a href="https://github.com/lib/pq/issues/771">performance
issues</a> with lib/pq are
<a href="https://github.com/jackc/pgx#choosing-between-the-pgx-and-databasesql-interfaces">well-known</a>,
and the <a href="https://github.com/lib/pq#status">repo itself</a> is marked as
no longer developed.</p>
<p>It is roughly 44-76% slower than an alternative driver that avoids
<code>database/sql</code>.</p>
<p>10M Rows, 16 columns, each column 32 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">104</span>.53<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.40s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">102</span>.57s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">110</span>.08s
Throughput:<span class="w"> </span><span class="m">95</span>,665.37<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>,129.25<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">90</span>,847.08<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">97</span>,490.96<span class="w"> </span>rows/s
</pre></div>
<p>10M Rows, 3 columns, each column 8 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">8</span>.16<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.43s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">7</span>.44s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">8</span>.80s
Throughput:<span class="w"> </span><span class="m">1</span>,225,986.47<span class="w"> </span>±<span class="w"> </span><span class="m">66</span>,631.53<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">1</span>,136,581.82<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">1</span>,343,441.37<span class="w"> </span>rows
</pre></div>
<p>The other driver I tested is
<a href="https://github.com/jackc/pgx">jackc/pgx</a>, without <code>database/sql</code>.</p>
<p>10M Rows, 16 columns, each column 32 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">46</span>.54<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.60s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">44</span>.09s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">49</span>.51s
Throughput:<span class="w"> </span><span class="m">214</span>,869.42<span class="w"> </span>±<span class="w"> </span><span class="m">7</span>,265.10<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">201</span>,991.37<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">226</span>,801.07<span class="w"> </span>rows/s
</pre></div>
<p>10M Rows, 3 columns, each column 8 bytes:</p>
<div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">5</span>.20<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.44s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">4</span>.71s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">5</span>.96s
Throughput:<span class="w"> </span><span class="m">1</span>,923,722.79<span class="w"> </span>±<span class="w"> </span><span class="m">156</span>,820.46<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">1</span>,676,894.32<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">2</span>,124,966.60<span class="w"> </span>rows/
</pre></div>
<p>The discrepancies here are even greater than with the different SQLite
drivers.</p>
<h3 id="workloads-with-small-resultset">Workloads with small resultset</h3><p>I won't go into as much detail but if you're doing queries that don't
return many rows, the difference between drivers is negligible.</p>
<p>See <a href="https://github.com/eatonphil/databases-intuition#selects">here</a> for details.</p>
<h3 id="conclusion">Conclusion</h3><p>If you are doing INSERT-heavy workloads, or you are processing large
number of rows returned from your SQL database, you might want to try
benchmarking the same workload with different drivers.</p>
<p>And specifically, there is likely no good reason to use <code>lib/pq</code>
anymore for accessing PostgreSQL from Go. Just use jackc/pgx.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">For INSERT-heavy workloads in Go, you may want to switch database drivers. For PostgreSQL and SQLite, the popular drivers are 20-76% slower for this workload in my tests.<br><br>Some driver developers have reported issues with database/sql as an interface.<a href="https://t.co/NLVp0P2uiV">https://t.co/NLVp0P2uiV</a> <a href="https://t.co/RxTbgMZ1MG">pic.twitter.com/RxTbgMZ1MG</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1710249941904351718?ref_src=twsrc%5Etfw">October 6, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-10-05-go-database-sql-overhead-on-insert-heavy-workloads.htmlThu, 05 Oct 2023 00:00:00 +0000
- Intercepting and modifying Linux system calls with ptracehttp://notes.eatonphil.com/2023-10-01-intercepting-and-modifying-linux-system-calls-with-ptrace.html<p>How software fails is interesting. But real-world errors can be
infrequent to manifest. <a href="https://course.ece.cmu.edu/~ece749/docs/faultInjectionSurvey.pdf">Fault
injection</a>
is a formal-sounding term that just means: trying to explicitly
trigger errors in the hopes of discovering bad logic, typically
during automated tests.</p>
<p><a href="https://github.com/jepsen-io/jepsen">Jepsen</a>
and <a href="https://github.com/Netflix/chaosmonkey">ChaosMonkey</a> are two
famous examples that help to trigger process and network failure. But
what about disk and filesystem errors?</p>
<p>A few avenues seem worth investigating:</p>
<ul>
<li>A custom FUSE filesystem</li>
<li>An LD_PRELOAD interception layer</li>
<li>A ptrace system call interception layer</li>
<li>A <code>SECCOMP_RET_TRAP</code> interception layer</li>
<li>Or, symbolic analysis a la <a href="https://research.cs.wisc.edu/adsl/Publications/alice-osdi14.html">Alice from University of
Wisconsin-Madison</a></li>
</ul>
<p>I would like to try out FUSE sometime. But LD_PRELOAD layer only works
if IO goes through libc, which won't be the case for all
programs. ptrace is something I've wanted to dig into for years since
learning about
<a href="https://www.usenix.org/system/files/hotcloud19-paper-young.pdf">gvisor</a>.</p>
<p><code>SECCOMP_RET_TRAP</code> doesn't have the same high-level guides that ptrace
does so maybe I'll dig into it later. And symbolic analysis might be
able to detect bad workloads but it also isn't fault injection. Maybe
it's the better idea but fault injection just sounds more fun.</p>
<p>So this particular post will cover intercepting system calls
(syscalls) using ptrace with code written in Zig. Not because readers
will likely write their own code in Zig but because hopefully the Zig
code will be easier for you to read and adapt to your language
compared to if we had to deal with the verbosity and inconvenience of
C.</p>
<p>In the end, we'll be able to intercept and force short (incomplete)
writes in a Go, Python, and C program. Emulating a disk that is having
an issue completing the write. This is a case that isn't common, but
should probably be handled with retries in production code.</p>
<p>This post corresponds roughly to <a href="https://github.com/eatonphil/badio/tree/720c3ee0482e6dcb1dd49d1789bccf86747b7776">this
commit</a>
on GitHub.</p>
<h3 id="a-bad-program">A bad program</h3><p>First off, let's write some code for a program that would exhibit a
short write. Basically, we write to a file and don't check how many
bytes we wrote. This is extremely common code; or at least I've
written it often.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="k">go</span>
<span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span><span class="s">"test.txt"</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_RDWR</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_TRUNC</span><span class="p">,</span><span class="w"> </span><span class="mo">0755</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">text</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"some great stuff"</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">text</span><span class="p">))</span>
<span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="p">}</span>
</pre></div>
<p>With this code, if the <code>Write()</code> call doesn't actually succeed in
writing everything, we won't know that. And the file will contain less
than all of <code>some great stuff</code>.</p>
<p>This logical mistake will happen rarely, if ever, on a normal
disk. But it is possible.</p>
<p>Now that we've got an example program in mind, let's see if we can
trigger the logic error.</p>
<h3 id="ptrace">ptrace</h3><p>ptrace is a somewhat cross-platform layer that allows you to intercept
syscalls in a process. You can read and modify memory and registers in
the process, when the syscalls starts and before it finishes.</p>
<p>gdb and strace both use ptrace for their magic.</p>
<p>Google's gvisor that <a href="https://cloud.google.com/run/docs/container-contract">powers various serverless runtimes in Google
Cloud</a> was also
historically based on ptrace (<code>PTRACE_SYSEMU</code> specifically, which we
won't explore much in this post).</p>
<p class="note">
Interestingly though,
gvisor <a href="https://gvisor.dev/blog/2023/04/28/systrap-release/">switched
only this year </a> (2023) to a different default backend for
trapping system calls. Based
on <a href="https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt"><code>SECCOMP_RET_TRAP</code></a>.
<br />
<br />
You can get similar vibes
from <a href="https://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html">this
Brendan Gregg post</a> on the dangers of using strace (that is based
on ptrace) in production.
</p><p>Although ptrace is cross-platform, actually writing
cross-platform-aware code with ptrace can be complex. So this post
assumes amd64/linux.</p>
<h3 id="protocol">Protocol</h3><p>The ptrace protocol is described in the <a href="https://man7.org/linux/man-pages/man2/ptrace.2.html">ptrace
manpage</a>, but
<a href="https://nullprogram.com/blog/2018/06/23/">Chris Wellons</a> and <a href="https://webdocs.cs.ualberta.ca/~paullu/C498/meng.ptrace.slides.pdf">a
University of Alberta
group</a>
also wrote nice introductions. I referenced these three pages
heavily.</p>
<p>Here's what the UAlberta page has to say:</p>
<p><img src="/assets/ptraceprotocol.webp" alt="ptrace's syscall tracing protocol"></p>
<p>We fork and have the child call <code>PTRACE_TRACEME</code>. Then we handle each
syscall entrance by calling <code>PTRACE_SYSCALL</code> and waiting with <code>wait</code>
until the child has entered the syscall. It is in this moment we can
mess with things.</p>
<h3 id="implementation">Implementation</h3><p>Let's turn that graphic into Zig code.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@cImport</span><span class="p">({</span>
<span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">"sys/ptrace.h"</span><span class="p">);</span>
<span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">"sys/user.h"</span><span class="p">);</span>
<span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">"sys/wait.h"</span><span class="p">);</span>
<span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">"errno.h"</span><span class="p">);</span>
<span class="p">});</span>
<span class="kr">const</span><span class="w"> </span><span class="n">cNullPtr</span><span class="o">:</span><span class="w"> </span><span class="o">?*</span><span class="n">anyopaque</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="c1">// TODO //</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">argsAlloc</span><span class="p">(</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">fork</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Fork failed!</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Child process</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_TRACEME</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">execv</span><span class="p">(</span>
<span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">(),</span>
<span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">..],</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Parent process</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">childPid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pid</span><span class="p">;</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">waitpid</span><span class="p">(</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="n">arena</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">childPid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">childPid</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childInterceptSyscalls</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>So like the graphic suggested, we fork and start a child process. That
means this Zig program should be called like:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>--library<span class="w"> </span>c<span class="w"> </span>main.zig
$<span class="w"> </span>./main<span class="w"> </span>/actual/program/to/intercept<span class="w"> </span>--and<span class="w"> </span>--its<span class="w"> </span>args
</pre></div>
<p>Presumably, as with strace or gdb, we could instead attach to an
already running process with <code>PTRACE_ATTACH</code> or <code>PTRACE_SEIZE</code> (based
on the <a href="https://man7.org/linux/man-pages/man2/ptrace.2.html">ptrace
manpage</a>) rather
than going the <code>PTRACE_TRACEME</code> route. But I haven't tried that out
yet.</p>
<p>With the child ready to be intercepted, we can implement the
<code>ChildManager</code> that actually does the interception.</p>
<h4 id="childmanager">ChildManager</h4><p>The core of the <code>ChildManager</code> is an infinite loop (at least as long
as the child process lives) that waits for the next syscall and then
calls a hook for the sytem call if it exists.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ChildManager</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">arena</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">childPid</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">pid_t</span><span class="p">,</span>
<span class="w"> </span><span class="c1">// TODO //</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">childInterceptSyscalls</span><span class="p">(</span>
<span class="w"> </span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ChildManager</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Handle syscall entrance</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">W</span><span class="p">.</span><span class="n">IFEXITED</span><span class="p">(</span><span class="n">status</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">getABIArguments</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">syscall</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">syscall</span><span class="p">();</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">hooks</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">hook</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">syscall</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">hook</span><span class="p">.</span><span class="n">syscall</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">hook</span><span class="p">.</span><span class="n">hook</span><span class="p">(</span><span class="n">cm</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">args</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>Later we'll write a hook for the <code>sys_write</code> syscall that
will force an incomplete write.</p>
<p>Back to the protocol, <code>childWaitForSyscall</code> will call <code>PTRACE_SYSCALL</code>
to allow the child process to start up again and continue until the
next syscall. We'll follow that by <code>wait</code>-ing for the child
process to be stopped again so we can handle the syscall entrance.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">childWaitForSyscall</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">)</span><span class="w"> </span><span class="kt">u32</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">status</span><span class="o">:</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_SYSCALL</span><span class="p">,</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">);</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">waitpid</span><span class="p">(</span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">status</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@bitCast</span><span class="p">(</span><span class="n">status</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now that we've intercepted a syscall (after <code>waitpid</code> finishes
blocking), we need to figure out what syscall it was. We do this by
calling <code>PTRACE_GETREGS</code> and reading the <code>rax</code> register which
according to <a href="https://stackoverflow.com/a/54957101/1507139">amd64/linux calling
convention</a> is the
syscall called.</p>
<h4 id="registers">Registers</h4><p><code>PTRACE_GETREGS</code> fills out the <a href="https://sites.uclouvain.be/SystInfo/usr/include/sys/user.h.html">following
struct</a>:</p>
<div class="highlight"><pre><span></span><span class="k">struct</span><span class="w"> </span><span class="nc">user_regs_struct</span>
<span class="p">{</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r15</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r14</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r13</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r12</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rbp</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rbx</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r11</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r10</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r9</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r8</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rax</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rcx</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rdx</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rsi</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rdi</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">orig_rax</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rip</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">cs</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">eflags</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rsp</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">ss</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">fs_base</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">gs_base</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">ds</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">es</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">fs</span><span class="p">;</span>
<span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">gs</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<p>Let's write a little amd64/linux-specific wrapper for accessing
meaningful fields.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ABIArguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">regs</span><span class="o">:</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">user_regs_struct</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">nth</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">4</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdi</span><span class="p">,</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rsi</span><span class="p">,</span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdx</span><span class="p">,</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">setNth</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mi">4</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rsi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">result</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rax</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">setResult</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">syscall</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">orig_rax</span><span class="p">;</span><span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>One thing to note is that the field we read to get <code>rax</code> is not
<code>aa.regs.rax</code> but <code>aa.regs.orig_rax</code>. This is because <code>rax</code> is also
the return value and <code>PTRACE_SYSCALL</code> gets called twice for some
syscalls on entrance and exit. The <code>orig_rax</code> field preserves the
original <code>rax</code> value on syscall entrance. You can read more about this
<a href="https://stackoverflow.com/questions/6468896/why-is-orig-eax-provided-in-addition-to-eax/6469069#6469069">here</a>.</p>
<h4 id="getting-and-setting-registers">Getting and setting registers</h4><p>Now let's write the <code>ChildManager</code> code that actually calls
<code>PTRACE_GETREGS</code> to fill out one of these structs.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">getABIArguments</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">)</span><span class="w"> </span><span class="n">ABIArguments</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">regs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_GETREGS</span><span class="p">,</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">args</span><span class="p">.</span><span class="n">regs</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">args</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Setting registers is similar, we just pass the struct back and call
<code>PTRACE_SETREGS</code> instead:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_SETREGS</span><span class="p">,</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">args</span><span class="p">.</span><span class="n">regs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="a-hook">A hook</h4><p>Now we finally have enough code to write a hook that can get and set
registers; i.e. manipulate a system call!</p>
<p>We'll start by registering a <code>sys_write</code> hook in the <code>hooks</code> field we
check in <code>childInterceptSyscalls</code> above.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">hooks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&</span><span class="p">[</span><span class="n">_</span><span class="p">]</span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">syscall</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">,</span>
<span class="w"> </span><span class="n">hook</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="p">(</span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="p">,</span>
<span class="w"> </span><span class="p">}{.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">syscall</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@intFromEnum</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">syscalls</span><span class="p">.</span><span class="n">X64</span><span class="p">.</span><span class="n">write</span><span class="p">),</span>
<span class="w"> </span><span class="p">.</span><span class="n">hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">,</span>
<span class="w"> </span><span class="p">}};</span>
</pre></div>
<p>If we look at the <a href="https://man7.org/linux/man-pages/man2/write.2.html">manpage for
<code>write</code></a> we see it
takes three arguments</p>
<ol>
<li>The file descriptor (fd) to write to</li>
<li>The address to start writing data from</li>
<li>And the number of bytes to write</li>
</ol>
<p>Going back to the <a href="https://stackoverflow.com/questions/2535989/what-are-the-calling-conventions-for-unix-linux-system-calls-and-user-space-f">calling
convention</a>
that means the fd will be in <code>rdi</code>, the data address in <code>rsi</code>, and the
data length in <code>rdx</code>.</p>
<p>So if we shorten the data length, we should be creating a short
(incomplete) write.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Truncate some bytes</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>In a more sophisticated version of this program, we could randomly
decide when to truncate data and randomly decide how much data to
truncate. However, for our purposes this is sufficient.</p>
<p>But there are some real problems with this code. When I ran this
program against a basic Go program, I saw duplicate requests.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Ah ok, PTRACE_SYSCALL gets hit when you both enter and exit a syscall.<br><br>So each time you call PTRACE_SYSCALL and you do stuff, you just call it again afterwards to handle/wait for the exit. <a href="https://t.co/PjmNwcMepG">pic.twitter.com/PjmNwcMepG</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1707846783035183267?ref_src=twsrc%5Etfw">September 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>So the deal with <code>PTRACE_SYSCALL</code> is that for (most?) syscalls, you
get to modify data before the data actually is handled by the
kernel. And you get to modify data after the kernel has finished the
syscall too.</p>
<p>This makes sense because <code>PTRACE_SYSCALL</code> (unlike <code>PTRACE_SYSEMU</code>)
allows the syscall to actually happen. And if we wanted to, for
example, modify the syscall exit code, we'd have to do that after the
syscall was done not before it started. We are modifying registers
directly after all.</p>
<p>All this means for our Zig code is that when we handle <code>sys_write</code>, we
need to call <code>PTRACE_SYSCALL</code> again to process the syscall
exit. Otherwise we'd reach this <code>writeHandler</code> for both entries and
exits, which would require some additional way of disambiguating
entrances from exits.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Truncate some bytes</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childReadData</span><span class="p">(</span><span class="n">dataAddress</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Got a write on {}: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="c1">// Handle syscall exit</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p class="note">
We could put the <code>cm.childWaitForSyscall()</code> waiting for
the syscall exit in the main loop and I did try that at
first. However, not all syscalls seemed to have the same entry and
exit hook and this resulted in the hooks sometimes starting with a
syscall exit rather than a syscall entry. So rather than making the
code more complicated, I decided to only wait for the exit on
syscalls I knew had an exit (by observation at least), like
<code>sys_write</code>.
</p><h3 id="multiple-writes?-no-bad-logic?">Multiple writes? No bad logic?</h3><p>So I had this code as is, correctly handling syscall entrances and
exits, but I was seeing multiple write calls. And the text file I was
writing to had the complete text I wanted to write. There was no short
write even though I truncated the data length.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Ok so what happens in this Go program if I truncate the amount of data?<br><br>I assumed Go would do nothing since all I did was call `f.Write()` once and `f.Write()` returns a number of bytes written.<br><br>But actually, it still writes everything! <a href="https://t.co/OSalKEbERM">pic.twitter.com/OSalKEbERM</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1707854642250408119?ref_src=twsrc%5Etfw">September 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>This took some digging into Go source code to understand. If you trace
what <code>os.File.Write()</code> does on Linux you eventually get to
<a href="https://cs.opensource.google/go/go/+/refs/tags/go1.21.1:src/internal/poll/fd_unix.go">src/internal/poll/fd_unix.go</a>:</p>
<div class="highlight"><pre><span></span><span class="c1">// Write implements io.Writer.</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">FD</span><span class="p">)</span><span class="w"> </span><span class="nx">Write</span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">writeLock</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">writeUnlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">pd</span><span class="p">.</span><span class="nx">prepareWrite</span><span class="p">(</span><span class="nx">fd</span><span class="p">.</span><span class="nx">isFile</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">max</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">IsStream</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">max</span><span class="o">-</span><span class="nx">nn</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">maxRW</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">max</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">maxRW</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ignoringEINTRIO</span><span class="p">(</span><span class="nx">syscall</span><span class="p">.</span><span class="nx">Write</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">Sysfd</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">[</span><span class="nx">nn</span><span class="p">:</span><span class="nx">max</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">nn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">syscall</span><span class="p">.</span><span class="nx">EAGAIN</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">pd</span><span class="p">.</span><span class="nx">pollable</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">pd</span><span class="p">.</span><span class="nx">waitWrite</span><span class="p">(</span><span class="nx">fd</span><span class="p">.</span><span class="nx">isFile</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">nn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">nn</span><span class="p">,</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ErrUnexpectedEOF</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>This might be common knowledge but I didn't realize Go did this. And
when I tried out the same basic program in Python and even C, the
behavior was the same. The builtin <code>write()</code> behavior on a file (in
many languages apparantly) is to retry until all data is written, with
some exceptions.</p>
<p>This makes sense since files on disk, unlike file descriptors backed
by network sockets, are generally always available. Compared to a
network connection, disks are physically close and almost always
stay connected. (With some obvious exceptions like
network-attached storage and thumb drives.)</p>
<p>So to trigger the short write, the easiest way seems to have the
<code>sys_write</code> call return an error that is NOT <code>EAGAIN</code> since the code
will retry if that is the error.</p>
<p>After looking through the <a href="https://man7.org/linux/man-pages/man2/write.2.html#ERRORS">list of errors that sys_write can
return</a>,
<code>EIO</code> seems like a nice one.</p>
<p>So let's do our final version of <code>writeHandler</code> and on the syscall
exit, we'll modify the return value (<code>rax</code> in amd64/linux) to be
<code>EIO</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Truncate some bytes</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Handle syscall exit</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">exitArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">getABIArguments</span><span class="p">();</span>
<span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Force the writes to stop after the first one by returning EIO.</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">result</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">-%</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">EIO</span><span class="p">;</span>
<span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">setResult</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="o">&</span><span class="n">exitArgs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Let's give it a whirl!</p>
<h3 id="all-together">All together</h3><p>Build the Zig fault injector and the Go test code:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>zig<span class="w"> </span>build-exe<span class="w"> </span>--library<span class="w"> </span>c<span class="w"> </span>main.zig
<span class="gp">$ </span><span class="o">(</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span><span class="nb">test</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>main.go<span class="w"> </span><span class="o">)</span>
</pre></div>
<p>And run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./main<span class="w"> </span>test/main
</pre></div>
<p>And check <code>test.txt</code>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.txt
some<span class="w"> </span>great<span class="w"> </span>stu
</pre></div>
<p>Hey, that's a short write! :)</p>
<h3 id="sidenote:-reading-data-from-the-child">Sidenote: Reading data from the child</h3><p>We accomplished everything we set out to, but there's one other useful
thing we can do: reading the actual data passed to the write syscall.</p>
<p>Just like how we can get the child process registers with
<code>PTRACE_GETREGS</code>, we can read child memory with
<code>PTRACE_PEEKDATA</code>. <code>PTRACE_PEEKDATA</code> takes the child process id and
the memory address in the child to read from. It returns a word of
data (which on amd64/linux is 8 bytes).</p>
<p>We can use the syscall arguments (data address and length) to keep
calling <code>PTRACE_PEEKDATA</code> on the child until we've read all bytes of
the data the child process wanted to write:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">childReadData</span><span class="p">(</span>
<span class="w"> </span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span>
<span class="w"> </span><span class="n">address</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">,</span>
<span class="w"> </span><span class="n">length</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">cm</span><span class="p">.</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_PEEKDATA</span><span class="p">,</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span>
<span class="w"> </span><span class="n">address</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="p">,</span>
<span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">asBytes</span><span class="p">(</span><span class="o">&</span><span class="n">word</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">byte</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">byte</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">data</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And we could modify <code>writeHandler</code> to print out the entirety of the write message each time (for debugging):</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Truncate some bytes</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childReadData</span><span class="p">(</span><span class="n">dataAddress</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Got a write on {}: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="c1">// Handle syscall exit</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">exitArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">getABIArguments</span><span class="p">();</span>
<span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Force the writes to stop after the first one by returning EIO.</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">result</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">-%</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">EIO</span><span class="p">;</span>
<span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">setResult</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
<span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="o">&</span><span class="n">exitArgs</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>That's pretty neat!</p>
<h3 id="next-steps">Next steps</h3><p>Short writes are just one of many bad IO interactions. Another fun one
would be to completely buffer all writes on a file descriptor (not
allowing anything to be written to disk at all) until fsync is called
on the file descriptor. Or <a href="https://www.usenix.org/conference/atc20/presentation/rebello">forcing fsyncs to
fail</a>.</p>
<p>An interesting optimization would be to apply <a href="https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt">seccomp
filters</a>
so that rather than paying a penalty for watching every syscall, I
only get notified about the ones I have hooks for like
<code>sys_write</code>. <a href="https://www.alfonsobeato.net/c/filter-and-modify-system-calls-with-seccomp-and-ptrace/">Here's another
post</a>
that explores ptrace with seccomp filters.</p>
<p>Credits: Thank you Charlie Cummings and Paul Khuong for reviewing a draft
of this post!</p>
<h3 id="selected-responses-after-publication">Selected responses after publication</h3><ul>
<li>oscooter on Reddit <a href="https://www.reddit.com/r/linux/comments/16x32l3/comment/k380m9q/?utm_source=reddit&utm_medium=web2x&context=3">gave some
tips</a>
on using ptrace, including using <code>process_vm_readv</code> instead of
<code>PTRACE_PEEKDATA</code> to read memory from the tracee process.</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Fault injection is a scary-sounding term. Intercepting and modifying Linux system calls sounds scary too.<br><br>But it's a neat way to trigger logical errors in programs, to build confidence we wrote code correctly.<br><br>Let's trigger short writes to disk in Zig!<a href="https://t.co/0C3tWt3vtT">https://t.co/0C3tWt3vtT</a> <a href="https://t.co/OS7auDe8jR">pic.twitter.com/OS7auDe8jR</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1708482934863180004?ref_src=twsrc%5Etfw">October 1, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-10-01-intercepting-and-modifying-linux-system-calls-with-ptrace.htmlSun, 01 Oct 2023 00:00:00 +0000
- How do databases execute expressions?http://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.html<p>Databases are fun. They sit at the confluence of Computer
Science topics that might otherwise not seem practical in life as a
developer. For example, every database with a query language is also a
programming language implementation of some caliber. That doesn't
include all databases though of course; see: RocksDB, FoundationDB,
TigerBeetle, etc.</p>
<p>This post looks at how various databases execute expressions in their
query language.</p>
<p>tldr; Most surveyed databases use a tree-walking interpreter. A few
use stack- or register-based virtual machines. A couple have
just-in-time compilers. And, tangentially, a few do vectorized
interpretation.</p>
<p class="note">
Throughout this post I'll use "virtual machine" as a shorthand for
stack- or register-based loops that process a linearized
set of instructions. I say this since it is sometimes fair to call a
tree-walking interpreter a virtual machine. But that is not what I
mean when I say virtual machine in this post.
</p><h3 id="stepping-back">Stepping back</h3><p>Programming languages are typically implemented by turning an
Abstract Syntax Tree (AST) into a linear set of instructions
for a virtual machine (e.g. CPython, Java, C#) or native code
(e.g. GCC's C compiler, Go, Rust). Some of the former implementations
also generate and run Just-In-Time (JIT) compiled native code
(e.g. Java and C#).</p>
<p>Less commonly these days in programming languages does the
implementation interpret off the AST or some other tree-like
intermediate representation. This style is often called
tree-walking.</p>
<p>Shell languages sometimes do tree-walking. Otherwise, implementations
that interpret directly off of a tree normally do so as a short-term
measure before switching to compiled virtual machine code or JIT-ed
native code (e.g. some JavaScript implementations, GraalVM, RPython,
etc.)</p>
<p>That is, while some major programming language implementations started
out with tree-walking interpreters, they mostly moved away from solely
tree-walking over a decade ago. See <a href="https://www.webkit.org/blog/189/announcing-squirrelfish/">JSC in
2008</a>, <a href="https://www.infoq.com/news/2007/12/ruby-19/">Ruby
in 2007</a>, etc.</p>
<p>My intuition is that tree-walking takes up more memory and is less
cache-friendly than the linear instructions you give to a virtual
machine or to your CPU. There are <a href="https://stefan-marr.de/downloads/oopsla23-larose-et-al-ast-vs-bytecode-interpreters-in-the-age-of-meta-compilation.pdf">some folks who
disagree</a>,
but they mostly talk about tree-walking when you've also got a JIT
compiler hooked up. Which isn't quite the same thing. There has also
been <a href="https://www.cs.cornell.edu/~asampson/blog/flattening.html">some early exploration and
improvements</a>
reported when tree-walking with a tree organized as an array.</p>
<h4 id="and-databases?">And databases?</h4><p>Databases often interpret directly off a tree. (It isn't, generally
speaking, fair to say they are AST-walking interpreters because
databases typically transform and optimize beyond just an AST as
parsed from user code.)</p>
<p>But not all databases interpret a tree. Some have a virtual
machine. And some generate and run JIT-ed native code.</p>
<h3 id="methodology">Methodology</h3><p>If a core function (in the query execution path that does something
like arithmetic or comparison) returns a value, that's a sign it's a
tree-walking interpreter. Or, if you see code that is evaluating its
arguments during execution, that's also a sign of a tree-walking
interpreter.</p>
<p>On the other hand, if the function mutates internal state such as by
assigning a value to a context or pushing to a stack, that's a sign
it's a stack- or register-based virtual machine. If a function pulls
its arguments from memory and doesn't evaluate the arguments, that's
also an indication it's a stack- or register-based virtual machine.</p>
<p>This approach can result in false-positives depending on the
architecture of the interpreter. User-defined functions (UDFs) would
probably accept evaluated arguments and return a value regardless of
how the interpreter is implemented. So it's important to find not just
functions that could be implemented like UDFs, but core builtin
behavior. Control flow implementations of functions like <code>if</code> or
<code>case</code> can be great places to look.</p>
<p>And tactically, I clone the source code and run stuff like <code>git grep
-i eval | grep -v test | grep \\.java | grep -i eval</code> or <code>git grep -i
expr | grep -v test | grep \\.go | grep -i expr</code> until I convince
myself I'm somewhere interesting.</p>
<p>Note: In talking about a broad swath of projects, maybe I've
misunderstood one or some. If you've got a correction, let me know! If
there's a proprietary database you work on where you can link to the
(publicly described) execution strategy, feel free to pass it along!
Or if I'm missing your public-source database in this list, send me a
message!</p>
<h3 id="survey">Survey</h3><h4><a href="https://github.com/cockroachdb/cockroach">Cockroach</a> (Ruling: Tree Walker)</h4><p>Judging by functions like <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/eval/expr.go#L105"><code>func (e *evaluator)
EvalBinaryExpr</code></a>
that <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/eval/expr.go#L106">evaluates the left-hand
side</a>
and then <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/eval/expr.go#L113">evaluates the right-hand
side</a>
and returns a value, Cockroach looks like a tree walking interpreter.</p>
<p>It gets a little more interesting though, since Cockroach also
<a href="https://www.cockroachlabs.com/docs/stable/vectorized-execution">supports</a>
vectorized expression execution. Vectorizing is a fancy term for
acting on many pieces of data at once rather than one at a time. It
doesn't necessarily imply SIMD. Here is an example of a <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go#L4427">vectorized
addition</a>
of two int64 columns.</p>
<h4><a href="https://github.com/ClickHouse/clickhouse">ClickHouse</a> (Ruling: Tree Walker + JIT)</h4><p>The ClickHouse architecture is a little unique and difficult for me to
read through – likely due to it being fairly mature, with serious
optimization. But they tend to document their header files well. So
files like
<a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/IFunction.h">src/Functions/IFunction.h</a>
and
<a href="https://github.com/ClickHouse/ClickHouse/blob/9af9b4a08542812694f171833a7afe08f5aaaafb/src/Interpreters/ExpressionActions.h">src/Interpreters/ExpressionActions.h</a>
were helpful.</p>
<p>They have also spoken publicly about their pipeline execution model;
e.g. <a href="https://presentations.clickhouse.com/meetup24/5.%20Clickhouse%20query%20execution%20pipeline%20changes/">this
presentation</a>
and <a href="https://github.com/ClickHouse/ClickHouse/issues/34045">this roadmap
issue</a>. But it
isn't completely clear how much pipeline execution (which is broader
than just expression evaluation) connects to expression evaluation.</p>
<p>Moreover, they have <a href="https://clickhouse.com/blog/clickhouse-just-in-time-compiler-jit">publicly
spoken</a>
about their support for JIT compilation for query execution. But let's
look at how execution works when the JIT is not enabled. For example,
If we take a look at how <a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/if.cpp"><code>if</code> is
implemented</a>,
we know that the <code>then</code> and <code>else</code> rows must be conditionally
evaluated.</p>
<p>In the runtime entrypoint,
<a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/if.cpp#L1048"><code>executeImpl</code></a>,
we see the function call
<a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/if.cpp#L983"><code>executeShortCircuitArguments</code></a>
which in turn calls
<a href="https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnFunction.cpp#L280"><code>ColumnFunction::reduce()</code></a>
which <a href="https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnFunction.cpp#L299">evaluates each column vector that is an
argument</a>
to the function and then calls execute on the function.</p>
<p>So from this we can tell the non-JIT execution is a tree walker and
that it is over <a href="https://twitter.com/ClickHouseDB/status/1705619463888900538">chunks of
columns</a>,
i.e. vectorized data, similar to Cockroach. However in ClickHouse
execution is <em>always</em> over column vectors.</p>
<p class="note">
In the original version of this post, I had some confusion about the
ClickHouse execution strategy. Robert Schulze from
ClickHouse <a href="https://clickhousedb.slack.com/archives/CUDSPUJ68/p1695307656700889">helped
clarify</a> things for me. Thanks Robert!
</p><h4><a href="https://github.com/duckdb/duckdb">DuckDB</a> (Ruling: Tree Walker)</h4><p>If we take a look at how <a href="https://github.com/duckdb/duckdb/blob/479c89e154f32012143d741c1a4f4d769f20044e/src/execution/expression_executor/execute_function.cpp#L59">function expressions are
executed</a>,
we can see each <a href="https://github.com/duckdb/duckdb/blob/479c89e154f32012143d741c1a4f4d769f20044e/src/execution/expression_executor/execute_function.cpp#L66">argument in the function being
evaluated</a>
before being passed to the actual function. So that looks like a tree
walking interpreter.</p>
<p>Like ClickHouse, DuckDB expression execution is always over column
vectors. You can read more about this architecture
<a href="https://duckdb.org/internals/vector.html">here</a> and
<a href="https://www.infoq.com/articles/analytical-data-management-duckdb/">here</a>.</p>
<h4><a href="https://github.com/influxdata/influxdb">Influx</a> (Ruling: Tree Walker)</h4><p>Influx originally had a SQL-like query language called InfluxQL. If we
look at <a href="https://github.com/influxdata/influxdb/blob/b3b982d746fdc34451ca44d262f83b483cd9ea33/storage/reads/influxql_eval.go#L41">how it evaluates a binary
expression</a>,
it first evaluates the left-hand side and then the right-hand side
before operating on the sides and returning a value. That's a
tree-walking interpreter.</p>
<p><a href="https://github.com/influxdata/flux">Flux</a> was the new query language
for Influx. While the Flux
<a href="https://github.com/influxdata/flux/blob/master/docs/VirtualMachine.md">docs</a>
suggest they transform to an intermediate representation that is
executed on a virtual machine, there's nothing I'm seeing that looks
like a stack- or register-based virtual machine. All the <a href="https://github.com/influxdata/flux/blob/master/interpreter/interpreter.go#L352">evaluation
functions</a>
evaluate their arguments and return a value. That looks like a
tree-walking interpreter to me.</p>
<p>Today Influx
<a href="https://www.influxdata.com/blog/the-plan-for-influxdb-3-0-open-source/">announced</a>
that Flux is in maintenance mode and they are focusing on InfluxQL
again.</p>
<h4><a href="https://github.com/MariaDB/server">MariaDB</a> / <a href="https://github.com/mysql/mysql-server">MySQL</a> (Ruling: Tree Walker)</h4><p>Control flow methods are normally a good way to see how an interpreter
is implemented. The implementation of COALESCE <a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/item_cmpfunc.cc#L3431">looks pretty
simple</a>. We
see it <a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/item_cmpfunc.cc#L3442">call
<code>val_str()</code></a>
for each argument to COALESCE. But I can only seem to find
implementations of <code>val_str()</code> on raw values and not
expressions. <code>Item_func_coalesce</code> itself does not implement
<code>val_str()</code> for example, which would be a strong indication of a tree
walker. Maybe it does implement <code>val_str()</code> through inheritance.</p>
<p>It becomes a little clearer if we look at non-control flow methods
like
<a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/item_func.cc#L2048"><code>acos</code></a>. In
this method we see <code>Item_func_acos</code> itself implement <code>val_real()</code> and
also call <code>val_real()</code> on all its arguments. In this case it's obvious
how the control flow of <code>acos(acos(.5))</code> would work. So that seems to
indicate expressions are executed with a tree walking interpreter.</p>
<p>I also noticed
<a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/sp_instr.cc">sql/sp_instr.cc</a>. That
is scary (in terms of invalidating my analysis) since it looks like a
virtual machine. But after looking through it, I think this virtual
machine only corresponds to how stored procedures are executed, hence
the <code>sp_</code> prefix for Stored Programs. <a href="https://dev.mysql.com/doc/dev/mysql-server/latest/stored_programs.html">MySQL
docs</a>
also explain that stored procedures are executed with a bytecode
virtual machine.</p>
<p>I'm curious why they don't use that virtual machine for query
execution.</p>
<p>As far as I can tell MySQL and MariaDB do not differ in this regard.</p>
<h4><a href="https://github.com/mongodb/mongo">MongoDB</a> (Ruling: Virtual Machine)</h4><p>Mongo <a href="https://laplab.me/posts/inside-new-query-engine-of-mongodb/">recently
introduced</a>
a virtual machine for executing queries, called Slot Based Execution
(SBE). We can find the SBE code in
<a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/sbe/vm/vm.cpp#L9313">src/mongo/db/exec/sbe/vm/vm.cpp</a>
and the main virtual machine entrypoint
<a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/sbe/vm/vm.cpp#L9313">here</a>. <a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/sbe/vm/vm.cpp#L9419">Looks
like</a>
a classic stack-based virtual machine!</p>
<p>It isn't completely clear to me if the SBE path is always used or if
there are still cases where it falls back to their old execution
model. You can read more about Mongo execution
<a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/README.md">here</a>
and <a href="https://www.mongodb.com/docs/manual/reference/sbe/">here</a>.</p>
<h4><a hjef="https://github.com/postgres/postgres">PostgreSQL</a> (Ruling: Virtual Machine + JIT)</h4><p>The top of PostgreSQL's
<a href="https://github.com/postgres/postgres/blob/cca97ce6a6653df7f4ec71ecd54944cc9a6c4c16/src/backend/executor/execExprInterp.c#L6">src/backend/executor/execExprInterp.c</a>
clearly explains that expression execution uses a virtual machine. You
see all the hallmarks: opcodes, a loop over a giant switch, etc. And
if we look at how <a href="https://github.com/postgres/postgres/blob/cca97ce6a6653df7f4ec71ecd54944cc9a6c4c16/src/backend/executor/execExprInterp.c#L728">function expressions are
executed</a>,
we see another hallmark which is that the function expression code
doesn't evaluate its arguments. They've already been evaluated. And
function expression code just acts on the results of its arguments.</p>
<p>PostgreSQL also
<a href="https://github.com/postgres/postgres/blob/master/src/backend/jit/README">supports</a>
JIT-ing expression execution. And we can find the switch between
interpreting and JIT-compiling an expression
<a href="https://github.com/postgres/postgres/blob/cca97ce6a6653df7f4ec71ecd54944cc9a6c4c16/src/backend/executor/execExpr.c#L873">here</a>.</p>
<h4><a href="https://github.com/questdb/questdb">QuestDB</a> (Ruling: Tree Walker + JIT)</h4><p>QuestDB <a href="https://questdb.io/blog/2022/01/12/jit-sql-compiler/">wrote about their execution engine
recently</a>. When
the conditions are right, they'll <a href="https://github.com/questdb/questdb/blob/11ac85510292596f0d21b10603e500f8edb5e486/core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java#L1394">switch over to a JIT
compiler</a>
and run native code.</p>
<p>But let's look at the default path. For example, how <a href="https://github.com/questdb/questdb/blob/11ac85510292596f0d21b10603e500f8edb5e486/core/src/main/java/io/questdb/griffin/engine/functions/bool/AndFunctionFactory.java#L82"><code>AND</code> is
implemented</a>. <code>AndBooleanFunction</code>
implements <code>BooleanFunction</code> which implements <code>Function</code>. An
expression can be evaluated by calling a <code>getX()</code> method on the
expression type that implements <code>Function</code>. <code>AndBooleanFunction</code> calls
<code>getBool()</code> on its left and right hand sides. And if we look at the
<a href="https://github.com/questdb/questdb/blob/11ac85510292596f0d21b10603e500f8edb5e486/core/src/main/java/io/questdb/griffin/engine/functions/BooleanFunction.java#L35">partial
implementation</a>
of <code>BooleanFunction</code> we'll also see it doing <code>getX()</code> specific
conversions during the call of <code>getX()</code>. So that's a tree-walking
interpreter.</p>
<h4><a href="https://github.com/scylladb/scylladb">Scylla</a> (Ruling: Tree Walker)</h4><p>If we take a look at how <a href="https://github.com/scylladb/scylladb/blob/08197882074227edbd0a95f49914913e3124753d/cql3/expr/expression.cc#L2145">functions are
evaluated</a>
in Scylla, we see function evaluation first <a href="https://github.com/scylladb/scylladb/blob/08197882074227edbd0a95f49914913e3124753d/cql3/expr/expression.cc#L2161">evaluating all of its
arguments</a>. And
the function evaluation function itself returns a
<code>cql3::raw_value</code>. So that's a tree-walking interpreter.</p>
<h4><a href="https://github.com/sqlite/sqlite">SQLite</a> (Ruling: Virtual Machine)</h4><p>SQLite's virtual machine is <a href="https://www.sqlite.org/opcode.html">comprehensive and
well-documented</a>. It encompasses
more than just expression evaluation but the entirety of query
execution.</p>
<p>We can find the massive virtual machine switch in
<a href="https://github.com/sqlite/sqlite/blob/8aaf63c6ac8b8292c0ecead0d2b04b68e9e6be78/src/vdbe.c#L971">src/vdbe.c</a>.</p>
<p>And if we look, for example, at how <code>AND</code> is implemented, we see it
<a href="https://github.com/sqlite/sqlite/blob/8aaf63c6ac8b8292c0ecead0d2b04b68e9e6be78/src/vdbe.c#L2536">pulling its arguments out of
memory</a>
(already evaluated) and assigning the result back to <a href="https://github.com/sqlite/sqlite/blob/8aaf63c6ac8b8292c0ecead0d2b04b68e9e6be78/src/vdbe.c#L2545">a designated
point in
memory</a>.</p>
<h4>SingleStore (Ruling: Virtual Machine + JIT)</h4><p>While there's no source code to link to, SingleStore <a href="https://www.youtube.com/watch?v=_vloWsdPCDs&t=3810s">gave a talk at
CMU</a> that broke
down their query execution pipeline. Their
<a href="https://docs.singlestore.com/cloud/query-data/advanced-query-topics/code-generation/">docs</a>
also cover the topic.</p>
<p><img src="/assets/memsql.webp" alt="SingleStore compiler pipeline"></p>
<h4><a href="https://github.com/pingcap/tidb">TiDB</a> (Ruling: Tree Walker)</h4><p>Similar to DuckDB and ClickHouse, TiDB implements vectorized
interpretation. They've <a href="https://www.pingcap.com/blog/10x-performance-improvement-for-expression-evaluation-made-possible-by-vectorized-execution/">written publicly about their switch to this
method</a>.</p>
<p>Let's take a look at how <code>if</code> is implemented in TiDB. There is a
vectorized and non-vectorized version of <code>if</code> (in
<a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go">expression/control_builtin.go</a>
and
<a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control_vec_generated.go">expression/control_builtin_generated.go</a>
respectively). So maybe they haven't completely switched over to
vectorized execution or maybe it can only be used in some conditions.</p>
<p>If we look at the <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go#L599">non-vectorized version of
<code>if</code></a>,
we see the <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go#L600">condition
evaluated</a>. And
then the <code>then</code> or <code>else</code> is evaluated <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go#L604">depending on the result of the
condition</a>. That's
a tree-walking interpreter.</p>
<h3 id="conclusion">Conclusion</h3><p>As the DuckDB team <a href="https://duckdb.org/why_duckdb.html">points out</a>,
vectorized interpretation or JIT compilation <a href="https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf">seem like the
future</a> for
database expression execution. These strategies seem particularly
important for analytics or time-series workloads. But vectorized
interpretation seems to make the most sense for column-wise storage
engines. And column-wise storage normally only makes sense for
analytics workloads. Still, TiDB and Cockroach are transactional
databases that also vectorize execution.</p>
<p>And while SQLite and PostgreSQL use the virtual machine model, it's
possible databases with tree-walking interpreters like Scylla and
MySQL/MariaDB have decided there is not significant enough gains to be
had (for transactional workloads) to justify the complexity of moving
to a compiler + virtual machine architecture.</p>
<p>Tree-walking interpreters and virtual machines are also independent
from whether or not execution is vectorized. So that will be another
interesting dimension to watch: if more databases move toward
vectorized execution even if they don't adapt JIT compilation.</p>
<p>Yet another alternative is that maybe as databases mature we'll see
compilation tiers similar to what <a href="https://webkit.org/blog/9329/a-new-bytecode-format-for-javascriptcore/">browsers
do</a>
<a href="https://v8.dev/blog/sparkplug">with JavaScript</a>.</p>
<p>Credits: Thanks Max Bernstein, Alex Miller, and Justin Jaffray for
reviewing a draft version of this! And thanks to the #dbs channel on
<a href="https://eatonphil.com/discord.html">Discord</a> for instigating this
post!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I spent some time looking into how various databases execute expressions in their query language.<br><br>Most of them have a tree-walking interpreter, some have a virtual machine, and some do just-in-time compilation.<br><br>Let's dig into some database code to see!<a href="https://t.co/BIGtHKh1X4">https://t.co/BIGtHKh1X4</a> <a href="https://t.co/nmhe9HmYw7">pic.twitter.com/nmhe9HmYw7</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1704936432412868725?ref_src=twsrc%5Etfw">September 21, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.htmlThu, 21 Sep 2023 00:00:00 +0000
- Eight years of organizing tech meetupshttp://notes.eatonphil.com/eight-years-of-tech-meetups.html<p>This is a collection of random personal experiences. So if you don't
want to read everything, feel free to skip to the end for takeaways.</p>
<p>I write because I'd like to see more high-quality meetups. And maybe
my little bit of experience will help someone out.</p>
<h3 id="2015:-philadelphia">2015: Philadelphia</h3><p>I first tried to organize a meetup in Philly in 2015. I was
contracting at the time and I figured a meetup might be a good way to
source contracts or just meet interesting people. I created the
"Philadelphia Software in Business" (or some other similarly vaguely
named) group on Meetup.com.</p>
<p>I didn't have any network; the first companies I worked for were not
in Philly. But Meetup.com got me a few tens of people joining the
group.</p>
<p>My first challenge was finding a place to meet. I didn't know what I
was doing so I looked at restaurants, bars, and cafes for dedicated
event space. Needless to say, renting space was expensive on its
own. And there was always an additional required minimum dollar spent
per attendee.</p>
<p>I ultimately found a place near the Schuylkill River. Maybe it was a
community event space. Maybe I paid for it. I can't remember.</p>
<p>The first and only time I hosted an event for the group, I got a
surprising number of people for such a vague topic. There were maybe 6
of us. I was the youngest by far (I was 20), they were middle
age. Excel users and one visionary type.</p>
<p>There was no real point to the meetup and I didn't continue doing
it.</p>
<h3 id="2016---2017:-linode">2016 - 2017: Linode</h3><p>While I was at Linode, I organized "hack nights". I didn't ask for
anyone's approval before starting it. I just said I'd be ordering
pizza for anyone interested in staying after work to hack on
Linode-related projects. I was willing to pay for the pizza, in part
because I didn't want to risk being shut down by asking. But caker
paid for it each time.</p>
<p>I was nervous because people would show up and ask for pizza and not
want to hack. It was company-provided under the aspiration of doing
Linode-related work. Maybe I mentioned this or not. I can't
remember. I'm pretty sure they got their pizza.</p>
<p>Aside from myself, developers at Linode didn't really attend. The
folks who attended were support staff or folks from the technical
writing team who wanted more experience coding.</p>
<p>I ran this for maybe 3 to 5 Wednesdays before not continuing. It was
pretty fun! But staying after work for a few hours each Wednesday lost
its charm.</p>
<h4 id="book-club">Book Club</h4><p>Another time at Linode I started a book club. I was very torn about
attempting to make the book club open to anyone in the area or just to
Linode employees.</p>
<p>I knew I'd probably get more people to attend if I made it public. But
I wasn't sure if Linode would be cool with having external folks in
the office. Before they moved to the Old City office, visitors weren't
really a thing.</p>
<p>So I made it private to Linode. And I started with the most obvious
book for your average developer: Practical Common Lisp.</p>
<p>I am pretty sure I learned one big trick by this time though. When I
announced I'd be starting the book club I said something like this:</p>
<blockquote><p>Hey folks! I'm thinking of starting a book club. A book I have in
mind to start with is Practical Common Lisp. If I get at least one
other person to join in then I'll move forward!</p>
</blockquote>
<p>I ended up getting two folks: one developer and one support staff
member. We held the book club for 30 minutes once a week, covering one
chapter each week. I was the only one who read anything I think, but
the other two guys faithfully showed up for discussion.</p>
<p>I didn't ask for permission to do this either. And this time we met
during company time. I think it was 2-2:30PM.</p>
<p>It was fun. We finished the book. But Practical Common Lisp probably
wasn't a good choice. And I don't think I started a second book.</p>
<h3 id="2017---2020:-false-starts">2017 - 2020: False starts</h3><p>I moved to NYC and joined a small startup (~20 employees). Linode was
100+ employees.</p>
<p>We were in a WeWork so I considered starting a book club that was
public to the WeWork. I had learned by then the law of numbers: I
probably wouldn't get anyone from my company to join.</p>
<p>I considered putting up posters around the WeWork to advertise. But in
the end, I didn't end up going through with anything.</p>
<p>I did present at a few meetups in NYC during this time. But I didn't
organize anything.</p>
<p>And then the pandemic hit and everything disappeared.</p>
<h3 id="2021---2022:-virtual">2021 - 2022: Virtual</h3><p>In 2021 I started contracting again, thinking about starting a
company. I wanted a community to be at the center.</p>
<p>So I started a <a href="https://eatonphil.com/discord.html">Discord focused on software
internals</a>.</p>
<p>I had a bit more of a network at this point so I posted about the
Discord on Twitter and got 100 likes or something and slowly started
gaining folks in the Discord.</p>
<p>I knew it was going to do better if I was pretty active in it so I
made sure to post interesting blog posts at some regular
interval. About compilers or databases or something.</p>
<p>The Discord didn't turn out to help me out much in the
starting-a-company front. Or I didn't use it effectively for that.</p>
<p>I wanted more of an independent Discord of cool people who like to
learn about systems internals. And that's what I got.</p>
<p>This turned out to be ok though because I stopped working on that
company and the Discord is still around and I still get to hang out
with cool people.</p>
<p>This Discord is still around and hit 1,700 members recently. Among
other things, it has developers from many different database companies
in it these days. They hang out and help out the noobs like me learn
about database internals.</p>
<p>I culled inactive members recently, so today the total is around
1,100.</p>
<h4 id="hacker-nights">Hacker Nights</h4><p>During the pandemic I became frustrated that all the good meetups
disappeared so I decided to start an online one that would be somewhat
tied to the Discord and be about software internals.</p>
<p>I would find 2 or 3 people to present for 10-20 minutes each on
anything to do with software internals. We'd meet once a month at 8PM
NY time I think.</p>
<p>To get speakers I'd mostly DM people who I saw do interesting things
on Twitter or Hacker News. I was lucky to have <a href="https://www.philipotoole.com/">Philip
O'Toole</a> (author of rqlite), <a href="https://sirupsen.com/">Simon
Eskildsen</a> (author of the Napkin Math blog),
<a href="https://rsms.me/">Rasmus Andersson</a>, and many other excellent folks
speak.</p>
<p>You can find <a href="https://www.youtube.com/playlist?list=PL2t91m2Rvccpg2q2o_8lfuTYUhoP3AMwq">videos of these talks on
YouTube</a>.</p>
<p>The events were organized on Meetup.com. The group grew quickly
and I'd have about 100 people RSVP to each event. 10-20 normally
showed up.</p>
<p>I'd post a Zoom link on Meetup.com. Sometimes Meetup.com crashed right
as the meetup started, so no one could get a link. That was fun.</p>
<p>On two different nights I had Zoom bombers show up and play crazy
music or impersonate other members of the call and act weirdly (Zoom
lets you change your name after you've joined the call).</p>
<p>I learned a little bit about how to administrate a Zoom meeting.</p>
<p>I ran Hacker Nights for 5 months. It was tiring to
find speakers, tiring to deal with Zoom bombers. It was thankless and
I wasn't really enjoying it.</p>
<p>I was proud though that I was offering a channel for developers to
learn about software internals of compilers, databases, etc. And it
was great to meet many interesting speakers and attendees.</p>
<h3 id="2023:-designing-data-intensive-applications">2023: Designing Data Intensive Applications</h3><p>A month ago I put out a call on Twitter for folks in NYC interested in
reading through the book Designing Data Intensive Applications.</p>
<p>I'd read the book before and while it was challenging, I knew it was
immensely useful to any developer who works with data or an API.</p>
<p>By this time I'd learned my second trick: not asking for public
responses.</p>
<p>I said something like:</p>
<blockquote><p>Hey folks! I'm thinking of starting a book club meeting in Midtown
NYC reading through Designing Data Intensive Applications. DM me if
you'd be interested! If I get 2 other interested folks this will be on!</p>
</blockquote>
<p>I got maybe 40 DMs and 20 of them were based in NYC. Attendence thus
would have been higher if I made the book club virtual. But virtual
events take about as much effort as in-person events and somehow feel
less rewarding. So I went through with the NYC group.</p>
<p>I'm sure I could have gotten some company to provide us space, but
this would just mean more negotation for me and tedium for everyone
involved (bring your ID to be checked in, make sure you're registered,
etc.).</p>
<p>The group would meet every 2 weeks and cover 2 chapters at a
time. We'd meet for 30 minutes. To avoid needing to find a place to
meet, we'd meet in public at Bryant Park. (There turns out to be
plenty of available seating on Fridays at 9AM in Bryant Park. When it
rains we meet online.)</p>
<p>I wanted to keep the overhead minimal and the timeline slightly
aggressive. We'd be through the book in only 3 months. No crazy
commitment.</p>
<p>We've meet twice now and are 25% done the book. Attendance has been
around 7 to 9 people each time so far, or a little less than 50%.</p>
<p>They're almost all software developers, with one manager I think, who
work for a variety of large and small tech companies.</p>
<p>I'm loving it so far. And if it continues to go well, I'll probably
continue running in-person book clubs.</p>
<p>But it would only meet a few months a year, giving me a few month
breaks from running it.</p>
<h3 id="takeaways:-the-meh">Takeaways: The meh</h3><p>Organizing any event takes effort. Meetups are especially hard because
you need to find a place to run the meetup, you probably want to
provide food, and you need to find speakers.</p>
<p>Often you can find a single place to host the meetup, but you have to
constantly search for new speakers. Even one of the greatest meetups
in NYC, Papers We Love, seems to be struggling to find speakers.</p>
<p>The <a href="https://db.cs.cmu.edu/seminar2023/">CMU Database Group</a> and the
<a href="http://charap.co/category/reading-group/">Distributed Systems Reading
Group</a> seem to have the
right idea though. They only run sessions part of the year, and they
plan out all sessions in advance (including speakers).</p>
<p>However, they are both virtual. And I'm not so interested in running
virtual events anymore.</p>
<h3 id="takeaways:-the-good">Takeaways: The good</h3><p>For one, meetups are an awesome way to meet random people and expand
your network.</p>
<p>Two, they're educational. Even beyond the content you are meeting
about, there's the discussion alongside it you wouldn't get by
yourself. And you, as organizer, get to pick the topic.</p>
<p>These work out great for me. I love to meet people, and I love to
learn.</p>
<h4 id="tricks">Tricks</h4><p>Starting something new is embarassing because you're putting yourself
out there. Maybe no one in your network shares your interests (to the
degree or in the direction you do).</p>
<p>My tricks are:</p>
<ul>
<li>Most importantly: keep things low key! Don't stress people
out. Before learning and networking, the point of meetups is (or should
be) fun.</li>
<li>Saying you are "thinking about X" is a lightweight way to gauge
interest. As compared to just saying you're "starting X", which gives
you less room to back out if there turns out not to be interest.</li>
<li>Asking people to DM you with interest is less embarrassing than
asking for people to respond in public. Not everyone would want to
respond in public. If there's interest in private, you can share the
interest in public later on. But if you only ask for responses in
public and there are no responses, that can feel embarassing.</li>
<li>Indicating success criteria can help people understand how big
you're thinking of. I'm normally fine with doing something as small
as only two other people, so I say that. It's kind of like how
Kickstarters work with minimum funding levels.</li>
</ul>
<p>These ideas apply to corporate planning too. I think about them when
I'm sharing some new idea in company Slack as much as when I share on
Twitter.</p>
<p>A note on attendance rates: 10-20% actual attendance versus RSVP seems
normal. If you get a higher percentage of people actually attending
versus RSVP-ing you're doing pretty well!</p>
<h4 id="finding-sponsors">Finding sponsors</h4><p>One final idea is about paying for space or paying for food. Companies
with space and money for food are often willing to partner with folks
willing to do the work to run an event.</p>
<p>Running your own event in a company's space is advertising for
them. They get to be associated with cool tech. It's a chance for them
to pitch their open positions.</p>
<p>Obviously this happens often when you start a meetup hosted by your
own company. But you can also find other companies to host space.</p>
<p>The kind of people to find to make this happen are senior
developers or engineering managers, often on Twitter and sometimes on
LinkedIn.</p>
<p>I haven't done this myself yet because I'm not ready to commit to
running a meetup. But I see it happen. And it's the approach I'd take
if I were to run a real meetup again.</p>
<p>Though now that I've got some time off there are a few talks I'd like
to do myself.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post on my experience organizing tech meetups of various stripes over the years. And a few things I've learned.<br><br>"meetups" taken pretty broadly to include online communities, book clubs, and actual speaker events.<a href="https://t.co/xnd0LTneup">https://t.co/xnd0LTneup</a> <a href="https://t.co/w1oEaSNDHb">pic.twitter.com/w1oEaSNDHb</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1698793650036031753?ref_src=twsrc%5Etfw">September 4, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/eight-years-of-tech-meetups.htmlMon, 04 Sep 2023 00:00:00 +0000
- Thinking about functional programminghttp://notes.eatonphil.com/2023-08-15-thinking-about-functional-programming.html<p>Someone on Discord asked about how to learn functional programming.</p>
<p>The question and my initial tweet on the subject prompted an
interesting
<a href="https://twitter.com/ShriramKMurthi/status/1691548254331092992">discussion</a>
with <a href="https://twitter.com/ShriramKMurthi">Shriram
Krishnamurthi</a> and other folks.</p>
<p>So here's a slightly more thought out exploration.</p>
<p>And just for backstory sake: I spent a few years a while
ago <a href="https://github.com/eatonphil/ponyo">programming in
Standard ML</a> and I wrote
a <a href="https://github.com/eatonphil/bsdscheme">chunk of a Scheme
implementation</a>. I'm not an expert, but I have a bit of background.</p>
<p>Hey, this is a free opinion.</p>
<h3 id="concepts-from-functional-programming">Concepts from functional programming</h3><p>When people talk about functional programming, I think of a few key
choices you can make while programming:</p>
<ul>
<li>Immutability by default</li>
<li>(Tail) recursion by default</li>
<li>First-class functions (and the suite of tools that go along with it. e.g. map, reduce/fold)</li>
</ul>
<p>And if you have experience as a programmer, you either get the basic
gist of these tenets or you can easily read about the basics.</p>
<p>That is, while most programmers I've met understood the basics, most
programmers I've met were not particularly <em>comfortable</em> or <em>fluent</em>
expressing programs with these ideas.</p>
<p>For myself, the only way I got comfortable expressing code with these
ideas was lots of practice (as I mentioned above). And yet,
even after I did a bunch of programming in Standard ML and Scheme, I
really didn't see a particular benefit to practicing in a language
other than one with which I wa already generally comfortable.</p>
<p>You have to learn a lot of other random things when you pick up Scheme
or Standard ML that aren't just: practice immutability by default,
recursion, and first-class functions.</p>
<p>So I think it's kind of misguided when person A asks how to learn
functional programming and person B responds that they should learn
Haskell or OCaml or whatever. I see this happen pretty often online.</p>
<p class="note">
Beyond any "language for functional programming" as a recommendation
in general, Haskell is a particularly egregious suggestion to make
in my opinion because not only are you trying to practice functional
programming tenets but you're also dealing with a complex type
system and lazy evaluation.
</p><p>Instead, <a href="https://notes.eatonphil.com/practicing-recursion.html">practice immutability, recursion,
map/reduce</a> in
whatever language you like.</p>
<h3 id="programming-languages">Programming languages</h3><p>If you want to study programming languages, that's awesome. However,
functional programming doesn't really have any direct connection to
studying programming languages.</p>
<p>Languages are all over the place. Scheme, Standard ML, and Haskell are
worlds apart, even within the functional programming family.</p>
<p>And modern languages have mostly adapted the aspects of functional
programming that used to be unique 20 years ago.</p>
<p>Moreover, there are many other worthwhile families of languages to
learn about:</p>
<ul>
<li>Imperative/C-like (ok, you probably already know these)</li>
<li>Stack-based (JVM, x86 assembly sort of, Forth)</li>
<li>Array-oriented (APL, J)</li>
<li>Declarative (CSS, SQL, TLA+, Prolog)</li>
<li>Data (HTML, JSON, YAML)</li>
<li>Proof assistants (Isabelle/HOL, Coq)</li>
</ul>
<p>The list isn't exhaustive, and the variations within families can be
massive. But the point is that functional programming doesn't mean
crazy programming languages or crazy programming ideas. Functional
programming is a <em>subset</em> of crazy programming languages and crazy
programming ideas.</p>
<p>If you want to learn about crazy programming languages and crazy
programming ideas, you should! Go for it!</p>
<h3 id="introduction-to-computer-science">Introduction to Computer Science</h3><p>SICP is famous as the (former) introductory textbook for computer
science at MIT, and for its use of Scheme and the <a href="https://en.wikipedia.org/wiki/Meta-circular_evaluator">Metacircular
Evaluator</a>.</p>
<p>I don't have any experience teaching beginners how to program so I
don't have thoughts on if this made sense. That's for folks like
<a href="https://twitter.com/ShriramKMurthi/status/1691548254331092992">Shriram</a>
to think about.</p>
<p>However, I'm a half-decent programmer and I can't make it through this
book. If you liked the book or want to read it, that's great! But I <a href="https://notes.eatonphil.com/recommending-a-book.html">don't
recommend</a> it to
anyone.</p>
<p>And many introductory Computer Science textbooks just don't make much
sense to give to experienced programmers. For an experienced
programmer, they can be quite slow!</p>
<p>Most of the folks I see asking about how to learn functional
programming are experienced programmers.</p>
<h3 id="do-whatever-you-feel-like-doing">Do whatever you feel like doing</h3><p>I don't mean to overanalyze things, or get you overanalyzing
things. If you want to learn functional programming by writing
Haskell, that's awesome, you should go for it.</p>
<p>Wanting to do something is basically the best motivation there is.</p>
<p>The only reason I write this sort of post is so that folks who think
that using Haskell or Standard ML or Scheme or reading SICP is the
only way to learn functional programming see those ideas aren't
necessarily true.</p>
<h3 id="write-a-scheme!">Write a Scheme!</h3><p>Finally, for folks with time and motivation wanting to seriously work
out their functional programming muscles, writing a Scheme
implementation with a decent chunk of the standard library can be an
immensely enjoyable project.</p>
<p>You'll learn a lot about languages and compilers and algorithms and
data structures. It's leetcode with meaning.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post on this idea about different things to think about when talking about learning functional programming<br><br>1. Core concepts (immutability, first-class functions, recursion)<br><br>2. Exploring programming languages<br><br>3. Teaching CS to students<a href="https://t.co/k4LzvnHbNs">https://t.co/k4LzvnHbNs</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1691617741764018430?ref_src=twsrc%5Etfw">August 16, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-08-15-thinking-about-functional-programming.htmlTue, 15 Aug 2023 00:00:00 +0000
- We put a distributed database in the browser – and made a game of ithttp://notes.eatonphil.com/2023-07-11-we-put-a-distributed-database-in-the-browse.html<head>
<meta http-equiv="refresh" content="4;URL='https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed-database-in-the-browser/'" />
</head><p>This is an external post of mine. Click
<a href="https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed-database-in-the-browser/">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2023-07-11-we-put-a-distributed-database-in-the-browse.htmlTue, 11 Jul 2023 00:00:00 +0000
- Metaprogramming in Zig and parsing CSShttp://notes.eatonphil.com/2023-06-19-metaprogramming-in-zig-and-parsing-css.html<p>I knew Zig supported some sort of reflection on types. But I had been
confused about how to use it. What's the difference between
<code>@typeInfo</code> and <code>@TypeOf</code>? I ignored this aspect of Zig until a
problem came up at <a href="https://tigerbeetle.com">work</a> where reflection
made sense.</p>
<p>The situation was parsing and storing parsed fields in a struct. Each
field name that is parsed should match up to a struct field.</p>
<p>This is a fairly common problem. So this post walks through how to use
Zig's metaprogramming features in a simpler but related domain:
parsing CSS into typed objects, and pretty-printing these typed CSS
objects.</p>
<p>I live-streamed the implementation of this project yesterday on
<a href="https://www.twitch.tv/eatonphil">Twitch</a>. The video is <a href="https://youtube.com/@eatonphil">available on
YouTube</a>. And the source is <a href="https://github.com/eatonphil/zig-metaprogramming-css-parser">available
on GitHub</a>.</p>
<p>If you want to skip the parsing steps and just see the
metaprogramming, jump to the implementation of
<a href="#<code>match_property</code>">match_property</a>.</p>
<h3 id="parsing-css">Parsing CSS</h3><p>Let's imagine a CSS that only has alphabetical selectors, property
names and values.</p>
<p>The following would be valid:</p>
<div class="highlight"><pre><span></span><span class="nt">div</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">background</span><span class="p">:</span><span class="w"> </span><span class="kc">black</span><span class="p">;</span>
<span class="w"> </span><span class="k">color</span><span class="p">:</span><span class="w"> </span><span class="kc">white</span><span class="p">;</span>
<span class="p">}</span>
<span class="nt">a</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">color</span><span class="p">:</span><span class="w"> </span><span class="kc">blue</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Thinking about the structure of this stripped down CSS we've got:</p>
<ol>
<li>CSS properties that consist of property names and values (in our case the property names are limited to <code>background</code> and <code>color</code>)</li>
<li>CSS rules that have a selector and a list of rules</li>
<li>CSS sheets that have a list of rules</li>
</ol>
<p>Turning that into Zig in <code>main.zig</code>:</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">CSSProperty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">unknown</span><span class="o">:</span><span class="w"> </span><span class="kt">void</span><span class="p">,</span>
<span class="w"> </span><span class="n">color</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">background</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">};</span>
<span class="kr">const</span><span class="w"> </span><span class="n">CSSRule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">selector</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">properties</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">CSSProperty</span><span class="p">,</span>
<span class="p">};</span>
<span class="kr">const</span><span class="w"> </span><span class="n">CSSSheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rules</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">CSSRule</span><span class="p">,</span>
<span class="p">};</span>
</pre></div>
<p>The parser is going to look for CSS rules which contain a selector and
a list of CSS rules. The entrypoint is that simple:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span>
<span class="w"> </span><span class="n">arena</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSSheet</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rules</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">CSSRule</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span>
<span class="w"> </span><span class="c1">// Parse rules until EOF.</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_rule</span><span class="p">(</span><span class="n">arena</span><span class="p">,</span><span class="w"> </span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">rules</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">rule</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// In case there is trailing whitespace before the EOF,</span>
<span class="w"> </span><span class="c1">// eating whitespace here makes sure we exit the loop</span>
<span class="w"> </span><span class="c1">// immediately before trying to parse more rules.</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">CSSSheet</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">rules</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rules</span><span class="p">.</span><span class="n">items</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>Let's implement the <code>eat_whitespace</code> helper we've referenced. It
increments a cursor into the css file while it sees whitespace.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_index</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ascii</span><span class="p">.</span><span class="n">isWhitespace</span><span class="p">(</span><span class="n">css</span><span class="p">[</span><span class="n">index</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>In our stripped-down version of CSS all we have to think about is
ASCII. So the builtin <code>std.ascii.isWhitespace()</code> function is perfect.</p>
<p>Next, parsing CSS rules.</p>
<h4 id="<code>parse_rule()</code>"><code>parse_rule()</code></h4><p>A rule consists of a selector, opening curly braces, any number of
properties, and closing curly braces. We need to remember to eat
whitespace between each piece of syntax.</p>
<p>And we'll reference a few more parsing helpers we'll talk about next
for the selector, braces, and properties.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ParseRuleResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rule</span><span class="o">:</span><span class="w"> </span><span class="n">CSSRule</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">fn</span><span class="w"> </span><span class="n">parse_rule</span><span class="p">(</span>
<span class="w"> </span><span class="n">arena</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">ParseRuleResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// First parse selector(s).</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">selector_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">selector_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Then parse opening curly brace: {.</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">'{'</span><span class="p">);</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span>
<span class="w"> </span><span class="c1">// Then parse any number of properties.</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">css</span><span class="p">[</span><span class="n">index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">'}'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">attr_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_property</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">attr_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">attr_res</span><span class="p">.</span><span class="n">property</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Then parse closing curly brace: }.</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">'}'</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ParseRuleResult</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">rule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">CSSRule</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">selector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">selector_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">items</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>The <code>parse_syntax</code> helper is pretty simple, it does a bounds check and
increments the cursor if the current character matches the one you
pass in.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="w"> </span><span class="n">syntax</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">usize</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">initial_index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">css</span><span class="p">[</span><span class="n">initial_index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">syntax</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">initial_index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">debug_at</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected syntax: '{c}'."</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">syntax</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">NoSuchSyntax</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>This calls attention to debugging messages on failure. When we
fail to parse a syntax, we want to give a useful error message and
point at the exact line and column of code where the error happens.</p>
<p>So let's implement <code>debug_at</code>.</p>
<h4 id="<code>debug_at</code>"><code>debug_at</code></h4><p>First, we iterate over the entire CSS source code until we find the
entire line that contains the index where the parser failed. We also
want to identify the exact line and column corresponding to that
index.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">debug_at</span><span class="p">(</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="w"> </span><span class="kr">comptime</span><span class="w"> </span><span class="n">msg</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">args</span><span class="o">:</span><span class="w"> </span><span class="n">anytype</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">line_no</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">col_no</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">line_beginning</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">found_line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">css</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">'\n'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">found_line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">col_no</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">line_beginning</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="n">line_no</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">found_line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">found_line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">col_no</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then we print it all out in a nice format for users (which will likely
just be ourselves).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Error at line {}, column {}. "</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">line_no</span><span class="p">,</span><span class="w"> </span><span class="n">col_no</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="n">msg</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="s">"</span><span class="se">\n\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"{s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">css</span><span class="p">[</span><span class="n">line_beginning</span><span class="p">..</span><span class="n">i</span><span class="p">]});</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">col_no</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">col_no</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"^ Near here.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="p">}</span>
</pre></div>
<p>Ok, popping our mental stack, if we look back at <code>parse_rule</code> we still
need to implement <code>parse_identifier</code> and <code>parse_property</code>.</p>
<h4 id="<code>parse_identifier</code>"><code>parse_identifier</code></h4><p>An "identifier" for us here is just going to be an ASCII alphabetical
string (i.e. <code>[a-zA-Z]+</code>). We're going to <em>really</em> simplify CSS
because we're going to use this method for parsing not just selectors
but property names and even property values.</p>
<p>Zig again has a nice builtin <code>std.ascii.isAlphabetical</code> we can use.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ParseIdentifierResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">identifier</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">fn</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">ParseIdentifierResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_index</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ascii</span><span class="p">.</span><span class="n">isAlphabetic</span><span class="p">(</span><span class="n">css</span><span class="p">[</span><span class="n">index</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">initial_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">debug_at</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid identifier."</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">InvalidIdentifier</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ParseIdentifierResult</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">identifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">css</span><span class="p">[</span><span class="n">initial_index</span><span class="p">..</span><span class="n">index</span><span class="p">],</span>
<span class="w"> </span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>In reality, CSS properties are <a href="https://www.w3schools.com/cssref/css_selectors.php">highly
complex</a>. Parsing
CSS correctly isn't the main aim of this post though. :)</p>
<h4 id="<code>parse_property</code>"><code>parse_property</code></h4><p>The final piece of CSS we need to parse is properties. These consist
of a property name, then a colon, then a property value, and finally a
semicolon. And within each piece we eat whitespace.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ParsePropertyResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">property</span><span class="o">:</span><span class="w"> </span><span class="n">CSSProperty</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">fn</span><span class="w"> </span><span class="n">parse_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">ParsePropertyResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// First parse property name.</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">name_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Could not parse property name.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">e</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Then parse colon: :.</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">':'</span><span class="p">);</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Then parse property value.</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Could not parse property value.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">e</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Finally parse semi-colon: ;.</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">';'</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">property</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span><span class="n">name_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">,</span><span class="w"> </span><span class="n">value_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">debug_at</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">,</span><span class="w"> </span><span class="s">"Unknown property: '{s}'."</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">name_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">e</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ParsePropertyResult</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">property</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">property</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>Finally we get to the first bit of metaprogramming. Once we have a
property name and value, we need to turn that into a Zig union.</p>
<p>That's what <code>match_property()</code> is going to be responsible for doing.</p>
<h3 id="<code>match_property</code>"><code>match_property</code></h3><p>This function needs to take a property name and value and return a
<code>CSSProperty</code> with the correct field (matching up to the property name
passed in) and assigned to the value passed in.</p>
<p>If we didn't have metaprogramming or reflection, the implementation
might look like this:</p>
<div class="highlight"><pre><span></span><span class="n">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">eql</span><span class="p">(</span><span class="n">u8</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s2">"color"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">CSSProperty</span><span class="p">{</span><span class="o">.</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">eql</span><span class="p">(</span><span class="n">u8</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s2">"background"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">CSSProperty</span><span class="p">{</span><span class="o">.</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">error</span><span class="o">.</span><span class="n">UnknownProperty</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And that is not necessarily bad. In fact it may be how a lot of
production code looks over time as product needs evolve. You can keep
the internal field name unrelated to the external field name.</p>
<p>However for the sake of learning, we'll try to implement the same
thing with Zig metaprogramming.</p>
<p>And specifically, we can take a look at
<a href="https://github.com/ziglang/zig/blob/32cb9462ffa0a9df7a080d67eaf3a5762173f742/lib/std/json/static.zig">lib/std/json/static.zig</a>
to understand the reflection APIs.</p>
<p>Specifically, if we look at line 210-226 of that file, we can see them
iterating over fields of a <code>Union</code>:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">.</span><span class="n">Union</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">unionInfo</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">meta</span><span class="p">.</span><span class="n">trait</span><span class="p">.</span><span class="n">hasFn</span><span class="p">(</span><span class="s">"jsonParse"</span><span class="p">)(</span><span class="n">T</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">T</span><span class="p">.</span><span class="n">jsonParse</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">unionInfo</span><span class="p">.</span><span class="n">tag_type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="nb">@compileError</span><span class="p">(</span><span class="s">"Unable to parse into untagged union '"</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="nb">@typeName</span><span class="p">(</span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="s">"'"</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(.</span><span class="n">object_begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">result</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">T</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">name_token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">nextAllocMax</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">alloc_if_needed</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="p">.</span><span class="n">max_value_len</span><span class="p">.</span><span class="o">?</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">field_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">name_token</span><span class="p">.</span><span class="o">?</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="p">.</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">allocated_string</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">slice</span><span class="o">|</span><span class="w"> </span><span class="n">slice</span><span class="p">,</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">unionInfo</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
</pre></div>
<p>Then right after that (lines 226-243) we see them conditionally
modifying the result object:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">unionInfo</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">field_name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Free the name token now in case we're using an allocator that optimizes freeing the last allocated object.</span>
<span class="w"> </span><span class="c1">// (Recursing into parseInternal() might trigger more allocations.)</span>
<span class="w"> </span><span class="n">freeAllocated</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">name_token</span><span class="p">.</span><span class="o">?</span><span class="p">);</span>
<span class="w"> </span><span class="n">name_token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">u_field</span><span class="p">.</span><span class="kt">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// void isn't really a json type, but we can support void payload union tags with {} as a value.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(.</span><span class="n">object_begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(.</span><span class="n">object_end</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">;</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Recurse.</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parseInternal</span><span class="p">(</span><span class="n">u_field</span><span class="p">.</span><span class="kt">type</span><span class="p">,</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>We can see that the <code>.Union => |unionInfo|</code> condition is entered by
switching on <code>@typeInfo(T)</code> (<a href="https://github.com/ziglang/zig/blob/32cb9462ffa0a9df7a080d67eaf3a5762173f742/lib/std/json/static.zig#L149">line
149</a>)
and that <code>T</code> is a type (<a href="https://github.com/ziglang/zig/blob/32cb9462ffa0a9df7a080d67eaf3a5762173f742/lib/std/json/static.zig#L144">line
144</a>).</p>
<p>We don't have a generic type though. We know we are working with a
<code>CSSProperty</code>. And we know <code>CSSProperty</code> is a union so we don't need
the <code>switch</code> either.</p>
<p>So let's apply that to our <code>match_property</code> implementation.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And if we try to build that we'll get an error like this:</p>
<div class="highlight"><pre><span></span><span class="n">main</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">15</span><span class="o">:</span><span class="mi">31</span><span class="o">:</span><span class="w"> </span><span class="k">error</span><span class="o">:</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="err">'</span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">Type</span><span class="p">.</span><span class="n">UnionField</span><span class="err">'</span><span class="w"> </span><span class="n">must</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="kr">comptime</span><span class="o">-</span><span class="n">known</span><span class="p">,</span><span class="w"> </span><span class="n">but</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">is</span><span class="w"> </span><span class="n">runtime</span><span class="o">-</span><span class="n">known</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
</pre></div>
<p>Zig's "reflection" abilities here are comptime only. So we can't use a
runtime <code>for</code> loop, we must use a comptime <code>inline for</code> loop.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span>
<span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>As far as I understand it, this loop is basically unrolled and the
generated code would look a lot like our hard-coded initial version.</p>
<p>i.e. it would probably look like this:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">"background"</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">"background"</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">"color"</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">"color"</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">"unknown"</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">"unknown"</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Again that's just how I imagine the compiler to generate code from the
Union field reflection and <code>inline for</code> over the fields.</p>
<p>Try compiling that code. I get this:</p>
<div class="highlight"><pre><span></span><span class="go">main.zig:17:58: error: expected type 'void', found '[]const u8'</span>
<span class="go"> return @unionInit(CSSProperty, u_field.name, value);</span>
</pre></div>
<p>Thinking about the generated code makes it especially clear what's
happening. We have an <code>unknown</code> field in there that has a <code>void</code>
type. You can't assign a string to void.</p>
<p>We know at runtime that the condition where that happens should be
impossible because the user shouldn't enter <code>unknown</code> as a property
name. (Though now that I write this, I see they actually could. But
let's pretend they wouldn't.)</p>
<p>So the problem isn't a runtime failure but a comptime type-checking
failure.</p>
<p>Thankfully we can work around this with comptime conditionals.</p>
<p>If we wrap our current condition in an additional conditional that is
evaluated at comptime and filters out the <code>unknown</code> pass of the
<code>inline for</code> loop, the compiler shouldn't generate any code trying to
assign to the <code>unknown</code> field.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span>
<span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s">"unknown"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And indeed, if you try to compile it, this works. Since the
conditional is evaluated at compile time, we can imagine the code the
compiler generates is this:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">"background"</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">"background"</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">"color"</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">"color"</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>The <code>unknown</code> field has been skipped.</p>
<p>In retrospect, I realize that the <code>unknown</code> field probably isn't
even needed. We could eliminate it from the <code>CSSProperty</code> union and
get rid of that comptime conditional. However, sometimes there are in
fact private fields you want to skip. And I wanted to show how to
deal with that case.</p>
<p>For the last bit of metaprogramming, let's talk about displaying
the resulting <code>CSSSheet</code> we'd get after parsing.</p>
<h3 id="<code>sheet.display()</code>"><code>sheet.display()</code></h3><p>If we didn't have metaprogramming and wanted to display the sheet,
we'd have to switch on every possible union field.</p>
<p>Like so (I've modified the <code>CSSSheet</code> struct definition so it includes this method):</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">display</span><span class="p">(</span><span class="n">sheet</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">CSSSheet</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">sheet</span><span class="p">.</span><span class="n">rules</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">rule</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"selector: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">rule</span><span class="p">.</span><span class="n">selector</span><span class="p">});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">rule</span><span class="p">.</span><span class="n">properties</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">property</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">property</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">unknown</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">color</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">color_value</span><span class="o">|</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" color: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">color_value</span><span class="p">}),</span>
<span class="w"> </span><span class="p">.</span><span class="n">background</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">background_value</span><span class="o">|</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" background: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">background_value</span><span class="p">}),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>This is already a little annoying and could get unwieldy as we add
fields to the <code>CSSProperty</code> union.</p>
<p>Instead we can use the <code>inline for
(@typeInfo(CSSProperty).Union.fields) |u_field|</code> method to iterate
over all fields, skip the <code>unknown</code> field at comptime, and print out
the field name and value by matching on the current value of the
<code>property</code> enum by using the <code>@tagName</code> builtin.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">display</span><span class="p">(</span><span class="n">sheet</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">CSSSheet</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">sheet</span><span class="p">.</span><span class="n">rules</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">rule</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"selector: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">rule</span><span class="p">.</span><span class="n">selector</span><span class="p">});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">rule</span><span class="p">.</span><span class="n">properties</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">property</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">).</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s">"unknown"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="nb">@tagName</span><span class="p">(</span><span class="n">property</span><span class="p">)))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" {s}: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="nb">@tagName</span><span class="p">(</span><span class="n">property</span><span class="p">),</span>
<span class="w"> </span><span class="nb">@field</span><span class="p">(</span><span class="n">property</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">),</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h3 id="<code>main</code>"><code>main</code></h3><p>Finally, we pull it all together with a little <code>main</code> function.</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Let's read in a CSS file.</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">args</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Skips the program name.</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">file_name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">f</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">file_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">f</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFile</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">getEndPos</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">css_file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">file_size</span><span class="p">);</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">css_file</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">sheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="o">&</span><span class="n">arena</span><span class="p">,</span><span class="w"> </span><span class="n">css_file</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="n">sheet</span><span class="p">.</span><span class="n">display</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>And try it against some tests.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig
<span class="gp">$ </span>cat<span class="w"> </span>tests/basic.css
<span class="go">div {</span>
<span class="go"> background: white;</span>
<span class="go">}</span>
<span class="gp">$ </span>./main<span class="w"> </span>tests/basic.css
<span class="go">selector: div</span>
<span class="go"> background: white</span>
</pre></div>
<p>Nice! Let's try it against a more complex test.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>tests/multiple-blocks.css
<span class="go">div {</span>
<span class="go"> background: black;</span>
<span class="go"> color: white;</span>
<span class="go">}</span>
<span class="go">a {</span>
<span class="go"> color: blue;</span>
<span class="go">}</span>
<span class="gp">$ </span>./main<span class="w"> </span>tests/multiple-blocks.css
<span class="go">selector: div</span>
<span class="go"> background: black</span>
<span class="go"> color: white</span>
<span class="go">selector: a</span>
<span class="go"> color: blue</span>
</pre></div>
<p>Awesome. And against a bad CSS sheet:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>tests/bad-property.css
<span class="go">a {</span>
<span class="go"> big: pink;</span>
<span class="go">}</span>
<span class="gp">$ </span>./main<span class="w"> </span>cat<span class="w"> </span>tests/bad-property.css
<span class="go">Error at line 2, column 4. Unknown property: 'big'.</span>
<span class="go"> big: pink;</span>
<span class="go"> ^ Near here.</span>
</pre></div>
<p>We've got it!</p>
<h3 id="addendum:-<code>@field</code>">Addendum: <code>@field</code></h3><p>The docs were quite clear about using <code>@field(object, fieldName)</code> to
access the value of an <code>object</code> of type <code>@TypeOf(object)</code> at field
<code>fieldName</code>.</p>
<p>And the docs do mention <code>@field()</code> can be used as LHS but that only
really struct me when I was browsing the Zig JSON code like at <a href="https://github.com/ziglang/zig/blob/master/lib/std/json/static.zig#L307">line
307</a>.</p>
<p>I didn't use that in this little project but I've used it elsewhere,
so it I wanted to call this LHS behavior out.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post on parsing CSS as a way to motivate some basic exploration of metaprogramming in Zig.<br><br>I heavily referenced Zig's builtin JSON parser when learning this. And it is referenced multiple times in the post as well.<a href="https://t.co/CX6jXSLGiR">https://t.co/CX6jXSLGiR</a> <a href="https://t.co/jAJJZ0pONQ">pic.twitter.com/jAJJZ0pONQ</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1670868544953647129?ref_src=twsrc%5Etfw">June 19, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-06-19-metaprogramming-in-zig-and-parsing-css.htmlMon, 19 Jun 2023 00:00:00 +0000
- Implementing the Raft distributed consensus protocol in Gohttp://notes.eatonphil.com/2023-05-25-raft.html<p>As part of bringing myself up-to-speed after joining <a href="https://tigerbeetle.com/">TigerBeetle</a>, I
wanted some background on how distributed consensus and replicated state
machines protocols work. TigerBeetle uses <a href="https://pmg.csail.mit.edu/papers/vr-revisited.pdf">Viewstamped
Replication</a>. But I
wanted to understand all popular protocols and I decided to start with
<a href="https://raft.github.io/">Raft</a>.</p>
<p>We'll implement two key components of Raft in this post (leader
election and log replication). Around 1k lines of Go. It took me
around 7 months of sporadic studying to come to (what I hope is) an
understanding of the basics.</p>
<p><strong>Disclaimer</strong>: I'm not an expert. My implementation isn't yet hooked
up to <a href="https://github.com/jepsen-io/jepsen">Jepsen</a>. I've run it
through a mix of
<a href="https://github.com/eatonphil/goraft/tree/main#distributed-key-value-store-api">manual</a> and
<a href="https://github.com/eatonphil/goraft/tree/main/cmd/stress">automated
tests</a> and
it seems generally correct. This is not intended to be used in
production. It's just for my education.</p>
<p>All code for this project is <a href="https://github.com/eatonphil/goraft">available on GitHub</a>.</p>
<p>Let's dig in!</p>
<h3 id="the-algorithm">The algorithm</h3><p><a href="https://raft.github.io/raft.pdf">The Raft paper</a> itself is quite
readable. Give it a read and you'll get the basic idea.</p>
<p>The gist is that nodes in a cluster conduct elections to pick
a leader. Users of the Raft cluster send messages to the leader. The
leader passes the message to followers and waits for a majority to
store the message. Once the message is committed (majority consensus
has been reached), the message is applied to a state machine the user
supplies. Followers learn about the latest committed message from the
leader and apply each new committed message to their local
user-supplied state machine.</p>
<p>There's more to it including reconfiguration and snapshotting, which I
won't get into in this post. But you can get the gist of Raft by
thinking about 1) leader election and 2) replicated logs powering
replicated state machines.</p>
<h3 id="modeling-with-state-machines-and-key-value-stores">Modeling with state machines and key-value stores</h3><p>I've written before about how you can <a href="https://notes.eatonphil.com/minimal-key-value-store-with-hashicorp-raft.html">build a key-value store on top
of
Raft</a>. How
you can <a href="https://notes.eatonphil.com/zigrocks-sql.html">build a SQL database on top of a key-value
store</a>. And how you can
build a <a href="https://notes.eatonphil.com/distributed-postgres.html">distributed SQL database on top of
Raft</a>.</p>
<p>This post will start quite similarly to that first post except for
that we won't stop at the Raft layer.</p>
<h3 id="a-distributed-key-value-store">A distributed key-value store</h3><p>To build on top of the Raft library we'll build, we need to create a
state machine and commands that are sent to the state machine.</p>
<p>Our state machine will have two operations: get a value from a key,
and set a key to a value.</p>
<p>This will go in <code>cmd/kvapi/main.go</code>.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="nx">crypto</span><span class="w"> </span><span class="s">"crypto/rand"</span>
<span class="w"> </span><span class="s">"encoding/binary"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"math/rand"</span>
<span class="w"> </span><span class="s">"net/http"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"strconv"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="w"> </span><span class="s">"sync"</span>
<span class="w"> </span><span class="s">"github.com/eatonphil/goraft"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">statemachine</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span>
<span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kt">int</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">commandKind</span><span class="w"> </span><span class="kt">uint8</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">setCommand</span><span class="w"> </span><span class="nx">commandKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">getCommand</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">command</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">commandKind</span>
<span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">statemachine</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">cmd</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decodeCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="p">)</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">setCommand</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">getCommand</span><span class="p">:</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Key not found"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">value</span><span class="p">.(</span><span class="kt">string</span><span class="p">)),</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown command: %x"</span><span class="p">,</span><span class="w"> </span><span class="nx">cmd</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>But the Raft library we'll build needs to deal with various state
machines. So commands passed from the user into the Raft cluster must
be serialized to bytes.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">encodeCommand</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewBuffer</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="nb">uint8</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()</span>
<span class="p">}</span>
</pre></div>
<p>And the <code>Apply()</code> function from above needs to be able to decode the
bytes:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">decodeCommand</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="nx">command</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">commandKind</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="w"> </span><span class="nx">keyLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">9</span><span class="p">])</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">9</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">setCommand</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">valLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="o">+</span><span class="mi">8</span><span class="p">])</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="o">+</span><span class="mi">8</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="o">+</span><span class="mi">8</span><span class="o">+</span><span class="nx">valLen</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span>
<span class="p">}</span>
</pre></div>
<h4 id="http-api">HTTP API</h4><p>Now that we've modeled the key-value store as a state machine. Let's
build the HTTP endpoints that allow the user to operate the state
machine through the Raft cluster.</p>
<p>First, let's implement the <code>set</code> operation. We need to grab the key
and value the user passes in and call <code>Apply()</code> on the Raft
cluster. Calling <code>Apply()</code> on the Raft cluster will eventually call
the <code>Apply()</code> function we just wrote, but not until the message sent
to the Raft cluster is actually replicated.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">httpServer</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">raft</span><span class="w"> </span><span class="o">*</span><span class="nx">goraft</span><span class="p">.</span><span class="nx">Server</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span>
<span class="p">}</span>
<span class="c1">// Example:</span>
<span class="c1">//</span>
<span class="c1">// curl http://localhost:2020/set?key=x&value=1</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">setHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">setCommand</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"key"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"value"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Apply</span><span class="p">([][]</span><span class="kt">byte</span><span class="p">{</span><span class="nx">encodeCommand</span><span class="p">(</span><span class="nx">c</span><span class="p">)})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not write key-value: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>To reiterate, we tell the Raft cluster we want this message
replicated. The message contains the operation type (<code>set</code>) and the
operation details (<code>key</code> and <code>value</code>). These messages are custom to
the state machine we wrote. And they will be interpreted by the state
machine we wrote, on each node in the cluster.</p>
<p>Next we handle <code>get</code>-ing values from the cluster. There are two ways
to do this. We already embed a local copy of the distributed key-value
map. We could just read from that map in the current process. But it
might not be up-to-date or correct. It would be fast to read
though. And convenient for debugging.</p>
<p>But the only <a href="https://github.com/etcd-io/etcd/issues/741"><em>correct</em> way to read from a Raft
cluster</a> is to pass the
read through the log replication too.</p>
<p>So we'll support both.</p>
<div class="highlight"><pre><span></span><span class="c1">// Example:</span>
<span class="c1">//</span>
<span class="c1">// curl http://localhost:2020/get?key=x</span>
<span class="c1">// 1</span>
<span class="c1">// curl http://localhost:2020/get?key=x&relaxed=true # Skips consensus for the read.</span>
<span class="c1">// 1</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">getHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">getCommand</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"key"</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"relaxed"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"true"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Key not found"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">v</span><span class="p">.(</span><span class="kt">string</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">[]</span><span class="nx">goraft</span><span class="p">.</span><span class="nx">ApplyResult</span>
<span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Apply</span><span class="p">([][]</span><span class="kt">byte</span><span class="p">{</span><span class="nx">encodeCommand</span><span class="p">(</span><span class="nx">c</span><span class="p">)})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected single response from Raft, got: %d."</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Error</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Error</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Result</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not encode key-value in http response: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">value</span><span class="p">[</span><span class="nx">written</span><span class="p">:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not encode key-value in http response: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="main">Main</h4><p>Now that we've set up our custom state machine and our HTTP API for
interacting with the Raft cluster, we'll tie it together with reading
configuration from the command-line and actually starting the Raft
node and the HTTP API.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">[]</span><span class="nx">goraft</span><span class="p">.</span><span class="nx">ClusterMember</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">http</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">config</span><span class="p">{}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--node"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">node</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Expected $value to be a valid integer in `--node $value`, got: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">node</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--http"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">http</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--cluster"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">clusterEntry</span><span class="w"> </span><span class="nx">goraft</span><span class="p">.</span><span class="nx">ClusterMember</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="s">";"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idAddress</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">part</span><span class="p">,</span><span class="w"> </span><span class="s">","</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">clusterEntry</span><span class="p">.</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseUint</span><span class="p">(</span><span class="nx">idAddress</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Expected $id to be a valid integer in `--cluster $id,$ip`, got: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">idAddress</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">clusterEntry</span><span class="p">.</span><span class="nx">Address</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">idAddress</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="nx">clusterEntry</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --node $index"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">http</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --http $address"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --cluster $node1Id,$node1Address;...;$nodeNId,$nodeNAddress"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cfg</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">crypto</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">[:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"cannot seed math/rand package with cryptographically secure random number generator"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">rand</span><span class="p">.</span><span class="nx">Seed</span><span class="p">(</span><span class="nb">int64</span><span class="p">(</span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">b</span><span class="p">[:])))</span>
<span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">sm</span><span class="w"> </span><span class="nx">statemachine</span>
<span class="w"> </span><span class="nx">sm</span><span class="p">.</span><span class="nx">db</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">db</span>
<span class="w"> </span><span class="nx">sm</span><span class="p">.</span><span class="nx">server</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">index</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">goraft</span><span class="p">.</span><span class="nx">NewServer</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">sm</span><span class="p">,</span><span class="w"> </span><span class="s">"."</span><span class="p">,</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Start</span><span class="p">()</span>
<span class="w"> </span><span class="nx">hs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">{</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">db</span><span class="p">}</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">"/set"</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">setHandler</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">"/get"</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">getHandler</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">http</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for the easy part: a distributed key-value store on top
of a Raft cluster.</p>
<p>Next we need to implement Raft.</p>
<h3 id="a-raft-server">A Raft server</h3><p>If we take a look at Figure 2 in the Raft paper, we get an idea for
all the state we need to model.</p>
<p><img src="/assets/raft-figure-2.png" alt="Raft Figure 2"></p>
<p>We'll dig into the details as we go. But for now let's turn that model
into a few Go types. This goes in <code>raft.go</code> in the base directory,
not <code>cmd/kvapi</code>.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">goraft</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bufio"</span>
<span class="w"> </span><span class="s">"context"</span>
<span class="w"> </span><span class="s">"encoding/binary"</span>
<span class="w"> </span><span class="s">"errors"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="w"> </span><span class="s">"math/rand"</span>
<span class="w"> </span><span class="s">"net"</span>
<span class="w"> </span><span class="s">"net/http"</span>
<span class="w"> </span><span class="s">"net/rpc"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"path"</span>
<span class="w"> </span><span class="s">"sync"</span>
<span class="w"> </span><span class="s">"time"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">StateMachine</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">cmd</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Result</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">Error</span><span class="w"> </span><span class="kt">error</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Entry</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Command</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">Term</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Set by the primary so it can learn about the result of</span>
<span class="w"> </span><span class="c1">// applying this command to the state machine</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ClusterMember</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Id</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">Address</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="c1">// Index of the next log entry to send</span>
<span class="w"> </span><span class="nx">nextIndex</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Highest log entry known to be replicated</span>
<span class="w"> </span><span class="nx">matchIndex</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Who was voted for in the most recent term</span>
<span class="w"> </span><span class="nx">votedFor</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// TCP connection</span>
<span class="w"> </span><span class="nx">rpcClient</span><span class="w"> </span><span class="o">*</span><span class="nx">rpc</span><span class="p">.</span><span class="nx">Client</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ServerState</span><span class="w"> </span><span class="kt">string</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">leaderState</span><span class="w"> </span><span class="nx">ServerState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"leader"</span>
<span class="w"> </span><span class="nx">followerState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"follower"</span>
<span class="w"> </span><span class="nx">candidateState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"candidate"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// These variables for shutting down.</span>
<span class="w"> </span><span class="nx">done</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span>
<span class="w"> </span><span class="nx">Debug</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">mu</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Mutex</span>
<span class="w"> </span><span class="c1">// ----------- PERSISTENT STATE -----------</span>
<span class="w"> </span><span class="c1">// The current term</span>
<span class="w"> </span><span class="nx">currentTerm</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">log</span><span class="w"> </span><span class="p">[]</span><span class="nx">Entry</span>
<span class="w"> </span><span class="c1">// votedFor is stored in `cluster []ClusterMember` below,</span>
<span class="w"> </span><span class="c1">// mapped by `clusterIndex` below</span>
<span class="w"> </span><span class="c1">// ----------- READONLY STATE -----------</span>
<span class="w"> </span><span class="c1">// Unique identifier for this Server</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// The TCP address for RPC</span>
<span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="c1">// When to start elections after no append entry messages</span>
<span class="w"> </span><span class="nx">electionTimeout</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span>
<span class="w"> </span><span class="c1">// How often to send empty messages</span>
<span class="w"> </span><span class="nx">heartbeatMs</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="c1">// When to next send empty message</span>
<span class="w"> </span><span class="nx">heartbeatTimeout</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span>
<span class="w"> </span><span class="c1">// User-provided state machine</span>
<span class="w"> </span><span class="nx">statemachine</span><span class="w"> </span><span class="nx">StateMachine</span>
<span class="w"> </span><span class="c1">// Metadata directory</span>
<span class="w"> </span><span class="nx">metadataDir</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="c1">// Metadata store</span>
<span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span>
<span class="w"> </span><span class="c1">// ----------- VOLATILE STATE -----------</span>
<span class="w"> </span><span class="c1">// Index of highest log entry known to be committed</span>
<span class="w"> </span><span class="nx">commitIndex</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Index of highest log entry applied to state machine</span>
<span class="w"> </span><span class="nx">lastApplied</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Candidate, follower, or leader</span>
<span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="nx">ServerState</span>
<span class="w"> </span><span class="c1">// Servers in the cluster, including this one</span>
<span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">[]</span><span class="nx">ClusterMember</span>
<span class="w"> </span><span class="c1">// Index of this server</span>
<span class="w"> </span><span class="nx">clusterIndex</span><span class="w"> </span><span class="kt">int</span>
<span class="p">}</span>
</pre></div>
<p>And let's build a constructor to initialize the state for all servers
in the cluster, as well as local server state.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">NewServer</span><span class="p">(</span>
<span class="w"> </span><span class="nx">clusterConfig</span><span class="w"> </span><span class="p">[]</span><span class="nx">ClusterMember</span><span class="p">,</span>
<span class="w"> </span><span class="nx">statemachine</span><span class="w"> </span><span class="nx">StateMachine</span><span class="p">,</span>
<span class="w"> </span><span class="nx">metadataDir</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span>
<span class="w"> </span><span class="nx">clusterIndex</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Explicitly make a copy of the cluster because we'll be</span>
<span class="w"> </span><span class="c1">// modifying it in this server.</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">[]</span><span class="nx">ClusterMember</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">clusterConfig</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Id must not be 0."</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Server</span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="nx">cluster</span><span class="p">[</span><span class="nx">clusterIndex</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span>
<span class="w"> </span><span class="nx">address</span><span class="p">:</span><span class="w"> </span><span class="nx">cluster</span><span class="p">[</span><span class="nx">clusterIndex</span><span class="p">].</span><span class="nx">Address</span><span class="p">,</span>
<span class="w"> </span><span class="nx">cluster</span><span class="p">:</span><span class="w"> </span><span class="nx">cluster</span><span class="p">,</span>
<span class="w"> </span><span class="nx">statemachine</span><span class="p">:</span><span class="w"> </span><span class="nx">statemachine</span><span class="p">,</span>
<span class="w"> </span><span class="nx">metadataDir</span><span class="p">:</span><span class="w"> </span><span class="nx">metadataDir</span><span class="p">,</span>
<span class="w"> </span><span class="nx">clusterIndex</span><span class="p">:</span><span class="w"> </span><span class="nx">clusterIndex</span><span class="p">,</span>
<span class="w"> </span><span class="nx">heartbeatMs</span><span class="p">:</span><span class="w"> </span><span class="mi">300</span><span class="p">,</span>
<span class="w"> </span><span class="nx">mu</span><span class="p">:</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Mutex</span><span class="p">{},</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And add a few debugging and assertion helpers.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%s [Id: %d, Term: %d] %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Format</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">RFC3339Nano</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">s</span><span class="p">.</span><span class="nx">Debug</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="p">))</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">debugf</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">s</span><span class="p">.</span><span class="nx">Debug</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="o">...</span><span class="p">))</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">warn</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"[WARN] "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="p">))</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">warnf</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="o">...</span><span class="p">))</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">Assert</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%s. Got a = %#v, b = %#v"</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">Server_assert</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Assert</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="p">),</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h3 id="persistent-state">Persistent state</h3><p>As Figure 2 says, <code>currentTerm</code>, <code>log</code>, and <code>votedFor</code> must be
persisted to disk as they're edited.</p>
<p>I like to initially doing the stupidest thing possible. So in the
first version of this project I used <code>encoding/gob</code> to write these
three fields to disk every time <code>s.persist()</code> was called.</p>
<p>Here is what this first version looked like:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">persist</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Truncate</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gob</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">PersistentState</span><span class="p">{</span>
<span class="w"> </span><span class="nx">CurrentTerm</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Log</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span>
<span class="w"> </span><span class="nx">VotedFor</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">votedFor</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Sync</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"Persisted. Term: %d. Log Len: %d. Voted For: %s."</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">votedFor</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p>But doing so means this implementation is a function of the size of
the log. And that was horrible for throughput.</p>
<p>I also noticed that <code>encoding/gob</code> is pretty inefficient.</p>
<p>For a simple struct like:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">X</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">A</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">B</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">C</span><span class="w"> </span><span class="kt">bool</span>
<span class="p">}</span>
</pre></div>
<p><code>encoding/gob</code> uses <a href="https://play.golang.com/p/TUe9TDgaZOw">68 bytes to store that data for when B has two
entries</a>. If we wrote the
encoder/decoder ourselves we could store that struct in 33 bytes (<code>8
(sizeof(A)) + 8 (sizeof(len(B))) + 16 (len(B) * sizeof(B)) + 1
(sizeof(C))</code>).</p>
<p>It's not that <code>encoding/gob</code> is bad. It just likely has different
constraints than we are party to.</p>
<p>So I decided to swap out <code>encoding/gob</code> for simply binary encoding the
fields and also, importantly, keeping track of exactly how many
entries in the log must be written and only writing that many.</p>
<h4 id="<code>s.persist()</code>"><code>s.persist()</code></h4><p>Here's what that looks like.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">PAGE_SIZE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">4096</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">ENTRY_HEADER</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">16</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">128</span>
<span class="c1">// Must be called within s.mu.Lock()</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">persist</span><span class="p">(</span><span class="nx">writeLog</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">writeLog</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">page</span><span class="w"> </span><span class="p">[</span><span class="nx">PAGE_SIZE</span><span class="p">]</span><span class="kt">byte</span>
<span class="w"> </span><span class="c1">// Bytes 0 - 8: Current term</span>
<span class="w"> </span><span class="c1">// Bytes 8 - 16: Voted for</span>
<span class="w"> </span><span class="c1">// Bytes 16 - 24: Log length</span>
<span class="w"> </span><span class="c1">// Bytes 4096 - N: Log</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[:</span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">)</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">],</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">())</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="mi">24</span><span class="p">],</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)))</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">page</span><span class="p">[:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Wrote full page"</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">PAGE_SIZE</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">writeLog</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">newLogOffset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">max</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="nx">nNewEntries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nb">int64</span><span class="p">(</span><span class="nx">PAGE_SIZE</span><span class="o">+</span><span class="nx">ENTRY_SIZE</span><span class="o">*</span><span class="nx">newLogOffset</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">bw</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewWriter</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entryBytes</span><span class="w"> </span><span class="p">[</span><span class="nx">ENTRY_SIZE</span><span class="p">]</span><span class="kt">byte</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newLogOffset</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Bytes 0 - 8: Entry term</span>
<span class="w"> </span><span class="c1">// Bytes 8 - 16: Entry command length</span>
<span class="w"> </span><span class="c1">// Bytes 16 - ENTRY_SIZE: Entry command</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="o">-</span><span class="nx">ENTRY_HEADER</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"Command is too large (%d). Must be at most %d bytes."</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">),</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="o">-</span><span class="nx">ENTRY_HEADER</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:</span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Term</span><span class="p">)</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">],</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">)))</span>
<span class="w"> </span><span class="nb">copy</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">16</span><span class="p">:],</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">))</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bw</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Wrote full page"</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">bw</span><span class="p">.</span><span class="nx">Flush</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Sync</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Persisted in %s. Term: %d. Log Len: %d (%d new). Voted For: %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Sub</span><span class="p">(</span><span class="nx">t</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">())</span>
<span class="p">}</span>
</pre></div>
<p>Again the important thing is that only the entries that <em>need</em> to be
written are written. We do that by <code>seek</code>-ing to the offset of the
first entry that needs to be written.</p>
<p>And we collect writes of entries in a <code>bufio.Writer</code> so we don't waste
write syscalls. Don't forget to flush the buffered writer!</p>
<p>And don't forget to flush all writes to disk with <code>fd.Sync()</code>.</p>
<p class="note">
<code>ENTRY_SIZE</code> is something that I could see being configurable based
on the workload. Some workloads truly need only 128 bytes. But a
key-value store probably wants much more than that. This
implementation doesn't try to handle the case of completely
arbitrary sized keys and values.
</p><p>Lastly, a few helpers used in there:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">min</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="o">~</span><span class="kt">int</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">~</span><span class="kt">uint64</span><span class="p">](</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="nx">T</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">max</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="o">~</span><span class="kt">int</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">~</span><span class="kt">uint64</span><span class="p">](</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="nx">T</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span>
<span class="p">}</span>
<span class="c1">// Must be called within s.mu.Lock()</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">getVotedFor</span><span class="p">()</span><span class="w"> </span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid cluster"</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span>
<span class="p">}</span>
</pre></div>
<h4 id="<code>s.restore()</code>"><code>s.restore()</code></h4><p>Now let's do the reverse operation, restoring from disk. This will
only be called once on startup.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">restore</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span>
<span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">metadataDir</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"md_%d.dat"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="p">)),</span>
<span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_SYNC</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_RDWR</span><span class="p">,</span>
<span class="w"> </span><span class="mo">0755</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Bytes 0 - 8: Current term</span>
<span class="w"> </span><span class="c1">// Bytes 8 - 16: Voted for</span>
<span class="w"> </span><span class="c1">// Bytes 16 - 24: Log length</span>
<span class="w"> </span><span class="c1">// Bytes 4096 - N: Log</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">page</span><span class="w"> </span><span class="p">[</span><span class="nx">PAGE_SIZE</span><span class="p">]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">page</span><span class="p">[:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">ensureLog</span><span class="p">()</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Read full page"</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">PAGE_SIZE</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[:</span><span class="mi">8</span><span class="p">])</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">setVotedFor</span><span class="p">(</span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">]))</span>
<span class="w"> </span><span class="nx">lenLog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="mi">24</span><span class="p">])</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lenLog</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nb">int64</span><span class="p">(</span><span class="nx">PAGE_SIZE</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="nx">Entry</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">lenLog</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entryBytes</span><span class="w"> </span><span class="p">[</span><span class="nx">ENTRY_SIZE</span><span class="p">]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Read full entry"</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Bytes 0 - 8: Entry term</span>
<span class="w"> </span><span class="c1">// Bytes 8 - 16: Entry command length</span>
<span class="w"> </span><span class="c1">// Bytes 16 - ENTRY_SIZE: Entry command</span>
<span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:</span><span class="mi">8</span><span class="p">])</span>
<span class="w"> </span><span class="nx">lenValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">])</span>
<span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Command</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">16</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">16</span><span class="o">+</span><span class="nx">lenValue</span><span class="p">]</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">ensureLog</span><span class="p">()</span>
<span class="p">}</span>
</pre></div>
<p>And a few helpers it calls:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">ensureLog</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Always has at least one log entry.</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">Entry</span><span class="p">{})</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="c1">// Must be called within s.mu.Lock()</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">setVotedFor</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">id</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid cluster"</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h3 id="the-main-loop">The main loop</h3><p>Now let's think about the main loop. Before starting the loop we need
to 1) restore persistent state from disk and 2) kick off an RPC
server so servers in the cluster can send and receive messages to and
from eachother.</p>
<div class="highlight"><pre><span></span><span class="c1">// Make sure rand is seeded</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">Start</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">followerState</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">done</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">restore</span><span class="p">()</span>
<span class="w"> </span><span class="nx">rpcServer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rpc</span><span class="p">.</span><span class="nx">NewServer</span><span class="p">()</span>
<span class="w"> </span><span class="nx">rpcServer</span><span class="p">.</span><span class="nx">Register</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Listen</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">address</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">mux</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">NewServeMux</span><span class="p">()</span>
<span class="w"> </span><span class="nx">mux</span><span class="p">.</span><span class="nx">Handle</span><span class="p">(</span><span class="nx">rpc</span><span class="p">.</span><span class="nx">DefaultRPCPath</span><span class="p">,</span><span class="w"> </span><span class="nx">rpcServer</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">server</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span><span class="nx">Handler</span><span class="p">:</span><span class="w"> </span><span class="nx">mux</span><span class="p">}</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">server</span><span class="p">.</span><span class="nx">Serve</span><span class="p">(</span><span class="nx">l</span><span class="p">)</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">done</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
</pre></div>
<p>In the main loop we are either in the leader state, follower state or
candidate state.</p>
<p>All states will potentially receive RPC messages from other servers in
the cluster but that won't be modeled in this main loop.</p>
<p>The only thing going on in the main loop is that:</p>
<ul>
<li>We send heartbeat RPCs (leader state)</li>
<li>We try to advance the commit index (leader state only) and apply commands to the state machine (leader and follower states)</li>
<li>We trigger a new election if we haven't received a message in some time (candidate and follower states)</li>
<li>Or we become the leader (candidate state)</li>
</ul>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">leaderState</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeat</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">advanceCommitIndex</span><span class="p">()</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">followerState</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">timeout</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">advanceCommitIndex</span><span class="p">()</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">candidateState</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">timeout</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">becomeLeader</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}()</span>
<span class="p">}</span>
</pre></div>
<p>Let's deal with leader election first.</p>
<h3 id="leader-election">Leader election</h3><p>Leader election happens every time nodes haven't received a message
from a valid leader in some time.</p>
<p>I'll break this up into four major pieces:</p>
<ol>
<li>Timing out and becoming a candidate after a random (but bounded)
period of time of not hearing a message from a valid leader:
<code>s.timeout()</code>.</li>
<li>The candidate requests votes from all other servers: <code>s.requestVote()</code>.</li>
<li>All servers handle vote requests: <code>s.HandleRequestVoteRequest()</code>.</li>
<li>A candidate with a quorum of vote requests becomes the leader: <code>s.becomeLeader()</code>.</li>
</ol>
<p>You increment <code>currentTerm</code>, vote for yourself and send RPC vote
requests to other nodes in the server.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">resetElectionTimeout</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">interval</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Duration</span><span class="p">(</span><span class="nx">rand</span><span class="p">.</span><span class="nx">Intn</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatMs</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatMs</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"New interval: %s."</span><span class="p">,</span><span class="w"> </span><span class="nx">interval</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">electionTimeout</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Add</span><span class="p">(</span><span class="nx">interval</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">timeout</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">hasTimedOut</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">After</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">electionTimeout</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">hasTimedOut</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">"Timed out, starting new election."</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">candidateState</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="o">++</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">requestVote</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Everything in there is implemented already except for
<code>s.requestVote()</code>. Let's dig into that.</p>
<h4 id="<code>s.requestvote()</code>"><code>s.requestVote()</code></h4><p>By referring back to Figure 2 from the Raft paper we can see how to
model the request vote request and response. Let's turn that into some Go types.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Term</span><span class="w"> </span><span class="kt">uint64</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">RequestVoteRequest</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">RPCMessage</span>
<span class="w"> </span><span class="c1">// Candidate requesting vote</span>
<span class="w"> </span><span class="nx">CandidateId</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Index of candidate's last log entry</span>
<span class="w"> </span><span class="nx">LastLogIndex</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Term of candidate's last log entry</span>
<span class="w"> </span><span class="nx">LastLogTerm</span><span class="w"> </span><span class="kt">uint64</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">RequestVoteResponse</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">RPCMessage</span>
<span class="w"> </span><span class="c1">// True means candidate received vote</span>
<span class="w"> </span><span class="nx">VoteGranted</span><span class="w"> </span><span class="kt">bool</span>
<span class="p">}</span>
</pre></div>
<p>Now we just need to fill the <code>RequestVoteRequest</code> struct out and send
it to each other node in the cluster in parallel. As we iterate
through nodes in the cluster, we skip ourselves (we always immediately
vote for ourselves).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">requestVote</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Requesting vote from %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span>
<span class="w"> </span><span class="nx">lastLogIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">Term</span>
<span class="w"> </span><span class="nx">req</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">RequestVoteRequest</span><span class="p">{</span>
<span class="w"> </span><span class="nx">RPCMessage</span><span class="p">:</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">CandidateId</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span>
<span class="w"> </span><span class="nx">LastLogIndex</span><span class="p">:</span><span class="w"> </span><span class="nx">lastLogIndex</span><span class="p">,</span>
<span class="w"> </span><span class="nx">LastLogTerm</span><span class="p">:</span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="nx">RequestVoteResponse</span>
<span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">rpcCall</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Server.HandleRequestVoteRequest"</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Will retry later</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now remember from Figure 2 in the Raft paper that we must always check
that the RPC request and response is still valid. If the term of the
response is greater than our own term, we must immediately stop
processing and revert to follower state.</p>
<p>Otherwise only if the response is still relevant to us at the moment
(the response term is the same as the request term) <em>and</em> the request
has succeeded do we count the vote.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">rsp</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Vote granted by %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}(</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for the candidate side of requesting a vote.</p>
<p>The implementation of <code>s.updateTerm()</code> is simple. It just takes care
of transitioning to follower state if the term of an RPC message is
greater than the node's current term.</p>
<div class="highlight"><pre><span></span><span class="c1">// Must be called within a s.mu.Lock()</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">transitioned</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">Term</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">followerState</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">setVotedFor</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nx">transitioned</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">"Transitioned to follower"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">transitioned</span>
<span class="p">}</span>
</pre></div>
<p>And the implementation of <code>s.rpcCall()</code> is a wrapper around <code>net/rpc</code>
to lazily connect.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">rpcCall</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rpcClient</span><span class="w"> </span><span class="o">*</span><span class="nx">rpc</span><span class="p">.</span><span class="nx">Client</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">rpc</span><span class="p">.</span><span class="nx">DialHTTP</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Address</span><span class="p">)</span>
<span class="w"> </span><span class="nx">rpcClient</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="c1">// TODO: where/how to reconnect if the connection must be reestablished?</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">rpcClient</span><span class="p">.</span><span class="nx">Call</span><span class="p">(</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">warnf</span><span class="p">(</span><span class="s">"Error calling %s on %d: %s."</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Let's dig into the other side of request vote, what happens when a
node receives a vote request?</p>
<h4 id="<code>s.handlevoterequest()</code>"><code>s.HandleVoteRequest()</code></h4><p>First off, as discussed above, we must always check the RPC term
versus our own and revert to follower if the term is greater than our
own. (Remember that since this is an RPC request it could come to a
server in any state: leader, candidate, or follower.)</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">HandleRequestVoteRequest</span><span class="p">(</span><span class="nx">req</span><span class="w"> </span><span class="nx">RequestVoteRequest</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="o">*</span><span class="nx">RequestVoteResponse</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Received vote request from %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span>
</pre></div>
<p>Then we can return immediately if the request term is lower than our
own (that means it's an old request).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Not granting vote request from %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"VoteGranted = false"</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And finally, we check to make sure the requester's log is at least as
up-to-date as our own and that we haven't already voted for
ourselves.</p>
<p>The first condition (up-to-date log) was not described in
the Raft paper that I could find. But the author of the paper
published a Raft TLA+ spec that does <a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla#L284">have it
defined</a>.</p>
<p>And the second condition you might think could never happen since we
already wrote the code that said when we trigger an election we vote
for ourselves. But since each server has a random election timeout,
the one who starts the election will differ in timing sufficiently
enough to catch other servers and allow them to vote for it.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">Term</span>
<span class="w"> </span><span class="nx">logLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">logOk</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LastLogTerm</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">LastLogTerm</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LastLogIndex</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nx">logLen</span><span class="p">)</span>
<span class="w"> </span><span class="nx">grant</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">logOk</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">grant</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Voted for %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">setVotedFor</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span>
<span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Not granting vote request from %d."</span><span class="p">,</span><span class="w"> </span><span class="o">+</span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Lastly, we need to address how the candidate who sent out vote
requests actually becomes the leader.</p>
<h4 id="<code>s.becomeleader()</code>"><code>s.becomeLeader()</code></h4><p>This is a relatively simple method. If we have a quorum of votes, we
become the leader!</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">becomeLeader</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">quorum</span><span class="o">--</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>There is a bit of bookkeeping we need to do like resetting <code>nextIndex</code>
and <code>matchIndex</code> for each server (noted in Figure 2). And we also need
to append a blank entry for the new term.</p>
<p class="note">
Despite the section quoted below in code, I still don't understand
why this blank entry is necessary.
</p><div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Reset all cluster state</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Yes, even matchIndex is reset. Figure 2</span>
<span class="w"> </span><span class="c1">// from Raft shows both nextIndex and</span>
<span class="w"> </span><span class="c1">// matchIndex are reset after every election.</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">matchIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">"New leader."</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">leaderState</span>
<span class="w"> </span><span class="c1">// From Section 8 Client Interaction:</span>
<span class="w"> </span><span class="c1">// > First, a leader must have the latest information on</span>
<span class="w"> </span><span class="c1">// > which entries are committed. The Leader</span>
<span class="w"> </span><span class="c1">// > Completeness Property guarantees that a leader has</span>
<span class="w"> </span><span class="c1">// > all committed entries, but at the start of its</span>
<span class="w"> </span><span class="c1">// > term, it may not know which those are. To find out,</span>
<span class="w"> </span><span class="c1">// > it needs to commit an entry from its term. Raft</span>
<span class="w"> </span><span class="c1">// > handles this by having each leader commit a blank</span>
<span class="w"> </span><span class="c1">// > no-op entry into the log at the start of its term.</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">Entry</span><span class="p">{</span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nx">Command</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">})</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Triggers s.appendEntries() in the next tick of the</span>
<span class="w"> </span><span class="c1">// main state loop.</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatTimeout</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And we're done with elections!</p>
<p>When I was working on this for the first time, I just stopped here and
made sure I could get to a stable leader quickly. If it takes more
than 1 term to establish a leader when you run three servers in the
cluster on localhost, you've probably got a bug.</p>
<p>In an ideal environment (which three processes on one machine most
likely is), leadership should be established quite quickly and without
many term changes. As the environment gets more adversarial
(e.g. processes crash frequently or network latency is high and
variable), leadership (and log replication) will take longer.</p>
<p class="note">
But just because we have leader election working when there are no
logs does not mean we'll have it working when we introduce log
replication since parts of voting depend on log analysis.
<br />
I had leader election working at one time but then it broke when I
got log replication working until I found some more bugs in leader
election and fixed them. Of course, there may still be bugs even
now.
</p><h3 id="log-replication">Log replication</h3><p>I'll break up log replication into four major pieces:</p>
<ol>
<li>User submits a message to the leader to be replicated: <code>s.Apply()</code>.</li>
<li>The leader sends uncommitted messages (messages from
<code>nextIndex</code>) to all followers: <code>s.appendEntries()</code>.</li>
<li>A follower receives a <code>AppendEntriesRequest</code> and stores new
messages if appropriate, letting the leader know when it does store
the messages: <code>s.HandleAppendEntriesRequest()</code>.</li>
<li>The leader tries to update <code>commitIndex</code> for the last uncommitted
message by seeing if it's been replicated on a quorum of servers:
<code>s.advanceCommitIndex()</code>.</li>
</ol>
<p>Let's dig in in that order.</p>
<h4 id="<code>s.apply()</code>"><code>s.Apply()</code></h4><p>This is the entry point for a user of the cluster to attempt to get
messages replicated into the cluster.</p>
<p>It must be called on the current leader of the cluster. In the future
the failure response might include the current leader. Or the user
could submit messages in parallel to all nodes in the cluster and
ignore <code>ErrApplyToLeader</code>. In the meantime we just assume the user can
figure out which server in the cluster is the leader.</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">ErrApplyToLeader</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Cannot apply message to follower, apply to leader."</span><span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">commands</span><span class="w"> </span><span class="p">[][]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="nx">ApplyResult</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">leaderState</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrApplyToLeader</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Processing %d new entry!"</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span>
</pre></div>
<p>Next we'll store the message in the leader's log along with a
Go channel that we must block on for the result of applying
the message in the state machine after the message has been committed
to the cluster.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">resultChans</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">command</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">commands</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">resultChans</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">make</span><span class="p">(</span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">Entry</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Command</span><span class="p">:</span><span class="w"> </span><span class="nx">command</span><span class="p">,</span>
<span class="w"> </span><span class="nx">result</span><span class="p">:</span><span class="w"> </span><span class="nx">resultChans</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span>
</pre></div>
<p>Then we kick off the replication process (this will not block).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">"Waiting to be applied!"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">appendEntries</span><span class="p">()</span>
</pre></div>
<p>And then we block until we receive results from each of the channels we created.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// TODO: What happens if this takes too long?</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="nx">ApplyResult</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">wg</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">WaitGroup</span>
<span class="w"> </span><span class="nx">wg</span><span class="p">.</span><span class="nx">Add</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">ch</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">resultChans</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o"><-</span><span class="nx">c</span>
<span class="w"> </span><span class="nx">wg</span><span class="p">.</span><span class="nx">Done</span><span class="p">()</span>
<span class="w"> </span><span class="p">}(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">ch</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">wg</span><span class="p">.</span><span class="nx">Wait</span><span class="p">()</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>The interesting thing here is that appending entries is detached from
the messages we just received. <code>s.appendEntries()</code> will probably
include at least the messages we just appended to our log, but it
might include more too if some servers are not very up-to-date. It may
even include less than the messages we append to our log since we'll
restrict the number of entries to send at one time so we keep latency
down.</p>
<h4 id="<code>s.appendentries()</code>"><code>s.appendEntries()</code></h4><p>This is the meat of log replication on the leader side. We send
unreplicated messages to each other server in the cluster.</p>
<p>By again referring back to Figure 2 from the Raft paper we can see how
to model the request vote request and response. Let's turn that into
some Go types too.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">AppendEntriesRequest</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">RPCMessage</span>
<span class="w"> </span><span class="c1">// So follower can redirect clients</span>
<span class="w"> </span><span class="nx">LeaderId</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Index of log entry immediately preceding new ones</span>
<span class="w"> </span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Term of prevLogIndex entry</span>
<span class="w"> </span><span class="nx">PrevLogTerm</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="c1">// Log entries to store. Empty for heartbeat.</span>
<span class="w"> </span><span class="nx">Entries</span><span class="w"> </span><span class="p">[]</span><span class="nx">Entry</span>
<span class="w"> </span><span class="c1">// Leader's commitIndex</span>
<span class="w"> </span><span class="nx">LeaderCommit</span><span class="w"> </span><span class="kt">uint64</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">AppendEntriesResponse</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">RPCMessage</span>
<span class="w"> </span><span class="c1">// true if follower contained entry matching prevLogIndex and</span>
<span class="w"> </span><span class="c1">// prevLogTerm</span>
<span class="w"> </span><span class="nx">Success</span><span class="w"> </span><span class="kt">bool</span>
<span class="p">}</span>
</pre></div>
<p>For the method itself, we start optimistically sending no entries and
decrement <code>nextIndex</code> for each server as the server fails to replicate
messages. This means that we might eventually end up sending the
entire log to one or all servers.</p>
<p>We'll set a max number of entries to send per request so we avoid
unbounded latency as followers store entries to disk. But we still
want to send a large batch so that we amortize the cost of <code>fsync</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">MAX_APPEND_ENTRIES_BATCH</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">8</span><span class="nx">_000</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">appendEntries</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Don't need to send message to self</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span>
<span class="w"> </span><span class="nx">prevLogIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nx">prevLogTerm</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">prevLogIndex</span><span class="p">].</span><span class="nx">Term</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entries</span><span class="w"> </span><span class="p">[]</span><span class="nx">Entry</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"len: %d, next: %d, server: %d"</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">next</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span>
<span class="w"> </span><span class="nx">entries</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">next</span><span class="p">:]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Keep latency down by only applying N at a time.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">entries</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">MAX_APPEND_ENTRIES_BATCH</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">entries</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">entries</span><span class="p">[:</span><span class="nx">MAX_APPEND_ENTRIES_BATCH</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">lenEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">entries</span><span class="p">))</span>
<span class="w"> </span><span class="nx">req</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">AppendEntriesRequest</span><span class="p">{</span>
<span class="w"> </span><span class="nx">RPCMessage</span><span class="p">:</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">LeaderId</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span>
<span class="w"> </span><span class="nx">PrevLogIndex</span><span class="p">:</span><span class="w"> </span><span class="nx">prevLogIndex</span><span class="p">,</span>
<span class="w"> </span><span class="nx">PrevLogTerm</span><span class="p">:</span><span class="w"> </span><span class="nx">prevLogTerm</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Entries</span><span class="p">:</span><span class="w"> </span><span class="nx">entries</span><span class="p">,</span>
<span class="w"> </span><span class="nx">LeaderCommit</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="nx">AppendEntriesResponse</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Sending %d entries to %d for term %d."</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">entries</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">rpcCall</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Server.HandleAppendEntriesRequest"</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Will retry next tick</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now, as with every RPC request and response, we must check terms and
potentially drop the message if it's outdated.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">rsp</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">leaderState</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Otherwise, if the message was successful, we'll update <code>matchIndex</code>
(the last confirmed message stored on the follower) and <code>nextIndex</code>
(the next likely message to send to the follower).</p>
<p>If the message was not successful, we decrement <code>nextIndex</code>. Next time
<code>s.appendEntries()</code> is called it will include one more previous
message for this replica.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Success</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">max</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="o">+</span><span class="nx">lenEntries</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">matchIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Message accepted for %d. Prev Index: %d, Next Index: %d, Match Index: %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">prev</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">matchIndex</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">max</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Forced to go back to %d for: %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}(</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And we're done the leader side of append entries!</p>
<h4 id="<code>s.handleappendentriesrequest()</code>"><code>s.HandleAppendEntriesRequest()</code></h4><p>Now for the follower side of log replication. This is, again, an RPC
handler that could be called at any moment. So we need to potentially
update the <code>term</code> (and transition to follower).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">HandleAppendEntriesRequest</span><span class="p">(</span><span class="nx">req</span><span class="w"> </span><span class="nx">AppendEntriesRequest</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="o">*</span><span class="nx">AppendEntriesResponse</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span>
</pre></div>
<p>"Hidden" in the "Candidates (§5.2):" section of Figure 2 is an additional rule about:</p>
<blockquote><p>If AppendEntries RPC received from new leader: convert to follower</p>
</blockquote>
<p>So we also need to handle that here. And if we're still not a
follower, we'll return immediately.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// From Candidates (§5.2) in Figure 2</span>
<span class="w"> </span><span class="c1">// If AppendEntries RPC received from new leader: convert to follower</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">candidateState</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">followerState</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span>
<span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Success</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">followerState</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Non-follower cannot append entries."</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next, we also return early if the request term is less than our
own. This would represent an old request.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Dropping request from old leader %d: term %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LeaderId</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Not a valid leader.</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now, finally, we know we're receiving a request from a valid
leader. So we need to immediately bump the election timeout.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Valid leader so reset election.</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span>
</pre></div>
<p>Then we do the log comparison to see if we can add the entries sent
from this request. Specifically, we make sure that our log at
<code>req.PrevLogIndex</code> exists and has the same term as <code>req.PrevLogTerm</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">logLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span>
<span class="w"> </span><span class="nx">validPreviousLog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* This is the induction step */</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">logLen</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="p">].</span><span class="nx">Term</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogTerm</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">validPreviousLog</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">"Not a valid log."</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next, we've got valid entries that we need to add to our log. This
implementation is a little more complex because we'll make use of Go
slice capacity so that <code>append()</code> never allocates.</p>
<p>Importantly, we must truncate the log if a new entry ever conflicts
with an existing one:</p>
<blockquote><p>If an existing entry conflicts with a new one (same index
but different terms), delete the existing entry and all that
follow it (§5.3)</p>
</blockquote>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">next</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">next</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">Entries</span><span class="p">));</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Entries</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="nx">next</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">cap</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">newTotal</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">Entries</span><span class="p">))</span>
<span class="w"> </span><span class="c1">// Second argument must actually be `i`</span>
<span class="w"> </span><span class="c1">// not `0` otherwise the copy after this</span>
<span class="w"> </span><span class="c1">// doesn't work.</span>
<span class="w"> </span><span class="c1">// Only copy until `i`, not `newTotal` since</span>
<span class="w"> </span><span class="c1">// we'll continue appending after this.</span>
<span class="w"> </span><span class="nx">newLog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="nx">Entry</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">newTotal</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nb">copy</span><span class="p">(</span><span class="nx">newLog</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newLog</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Term</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">prevCap</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">cap</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// If an existing entry conflicts with a new</span>
<span class="w"> </span><span class="c1">// one (same index but different terms),</span>
<span class="w"> </span><span class="c1">// delete the existing entry and all that</span>
<span class="w"> </span><span class="c1">// follow it (§5.3)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[:</span><span class="nx">i</span><span class="p">]</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Capacity remains the same while we truncated."</span><span class="p">,</span><span class="w"> </span><span class="nb">cap</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">prevCap</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Appending entry: %s. At index: %d."</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">e</span><span class="p">.</span><span class="nx">Command</span><span class="p">),</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Existing log is the same as new log"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Term</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Term</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Length is directly related to the index."</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)),</span><span class="w"> </span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">nNewEntries</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Finally, we update the server's local <code>commitIndex</code> to the min of
<code>req.LeaderCommit</code> and our own log length.</p>
<p>And finally we persist all these changes and mark the response as
successful.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LeaderCommit</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">LeaderCommit</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="nx">nNewEntries</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="p">)</span>
<span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Success</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>So the combined behavior of the leader and follower when replicating
is that a follower not in sync with the leader may eventually go down
to the beginning of the log so the leader and follower have some first
N messages of the log that match.</p>
<h4 id="<code>s.advancecommitindex()</code>"><code>s.advanceCommitIndex()</code></h4><p>Now when not just one follower but a quorum of followers all have a
matching first N messages, the leader can advance the cluster's
<code>commitIndex</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">advanceCommitIndex</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="c1">// Leader can update commitIndex on quorum.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">leaderState</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">lastLogIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lastLogIndex</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">--</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">isLeader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">j</span><span class="p">].</span><span class="nx">matchIndex</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">isLeader</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">quorum</span><span class="o">--</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"New commit index: %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And for every state a server might be in, if there are messages
committed but not applied, we'll apply one here. And importantly,
we'll pass the result back to the message's result channel if it
exists, so that <code>s.Apply()</code> can learn about the result.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// len(log.Command) == 0 is a noop committed by the leader.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Command</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">"Entry applied: %d."</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// TODO: what if Apply() takes too long?</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">statemachine</span><span class="p">.</span><span class="nx">Apply</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Command</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Will be nil for follower entries and for no-op entries.</span>
<span class="w"> </span><span class="c1">// Not nil for all user submitted messages.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Result</span><span class="p">:</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Error</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="heartbeats">Heartbeats</h3><p>Heartbeats combine log replication and leader election. Heartbeats
stave off leader election (follower timeouts). And heartbeats also
bring followers up-to-date if they are behind.</p>
<p>And it's a simple method. If it's time to heartbeat, we call
<code>s.appendEntries()</code>. That's it.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">heartbeat</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span>
<span class="w"> </span><span class="nx">timeForHeartbeat</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">After</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatTimeout</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">timeForHeartbeat</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatTimeout</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Add</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">Duration</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatMs</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">"Sending heartbeat"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">appendEntries</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>The reason this staves off leader election is because any number of
entries (0 or N) will come from a valid leader and will thus cause the
followers to reset their election timeout.</p>
<p>And that's the entirety of (the basics of) Raft.</p>
<p>There are probably bugs.</p>
<h3 id="running-kvapi">Running kvapi</h3><p>Now let's run the key-value API.</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>cmd/kvapi<span class="w"> </span><span class="o">&&</span><span class="w"> </span>go<span class="w"> </span>build
<span class="gp">$ </span>rm<span class="w"> </span>*.dat
</pre></div>
<h4 id="terminal-1">Terminal 1</h4><div class="highlight"><pre><span></span><span class="gp">$ </span>./kvapi<span class="w"> </span>--node<span class="w"> </span><span class="m">0</span><span class="w"> </span>--http<span class="w"> </span>:2020<span class="w"> </span>--cluster<span class="w"> </span><span class="s2">"0,:3030;1,:3031;2,:3032"</span>
</pre></div>
<h4 id="terminal-2">Terminal 2</h4><div class="highlight"><pre><span></span><span class="gp">$ </span>./kvapi<span class="w"> </span>--node<span class="w"> </span><span class="m">1</span><span class="w"> </span>--http<span class="w"> </span>:2021<span class="w"> </span>--cluster<span class="w"> </span><span class="s2">"0,:3030;1,:3031;2,:3032"</span>
</pre></div>
<h4 id="terminal-3">Terminal 3</h4><div class="highlight"><pre><span></span><span class="gp">$ </span>./kvapi<span class="w"> </span>--node<span class="w"> </span><span class="m">2</span><span class="w"> </span>--http<span class="w"> </span>:2022<span class="w"> </span>--cluster<span class="w"> </span><span class="s2">"0,:3030;1,:3031;2,:3032"</span>
</pre></div>
<h4 id="terminal-4">Terminal 4</h4><p>Remember that requests will go through the leader (except for if we
turn that off in the <code>/get</code> request). So you'll have to try sending a
message to each server until you find the leader.</p>
<p>To set a key:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>curl<span class="w"> </span>http://localhost:2020/set?key<span class="o">=</span>y<span class="p">&</span><span class="nv">value</span><span class="o">=</span>hello
</pre></div>
<p>To get a key:</p>
<div class="highlight"><pre><span></span><span class="gp">$ </span>curl<span class="w"> </span>http://localhost:2020/get<span class="se">\?</span>key<span class="se">\=</span>y
</pre></div>
<p>And that's that! Try killing a server and restarting it. A new leader
will be elected so you'll need to find the right one to send requests
to again. But all existing entries should still be there.</p>
<h3 id="a-test-rig">A test rig</h3><p>I won't cover the <a href="https://github.com/eatonphil/goraft/blob/main/cmd/sim/main.go">implementation of my test
rig</a> in
this post but I will describe it.</p>
<p>It's nowhere near Jepsen but it does have a specific focus:</p>
<ol>
<li>Can the cluster elect a leader?</li>
<li>Can the cluster store logs correctly?</li>
<li>Can the cluster of three nodes tolerate one node down?</li>
<li>How fast can it store N messages?</li>
<li>Are messages recovered correctly when the nodes shut down and start back up?</li>
<li>If a node's logs are deleted, is the log for that node recovered after it is restarted?</li>
</ol>
<p>This implementation passes these tests and handles around 20k-40k entries/second.</p>
<h3 id="considerations">Considerations</h3><p>This was quite a challenging project. Normally when I hack on stuff
like this I have TV (The Simpsons) on in the background. It's sort of
dumb but this was the first project where I absolutely could not focus
with that background noise.</p>
<p>There are a tedious number of conditions and I am not sure I got them
all (right). Numerous ways for subtle bugs.</p>
<h4 id="race-conditions-and-deadlocks">Race conditions and deadlocks</h4><p>It's very easy to program in race conditions. Thankfully Go has the
<code>-race</code> flag that detects this. This makes sure that you are locking
read and write access to shared variables when necessary.</p>
<p>On the other side of race conditions, Go does not help you out with:
deadlocks. Once you've got locks in place for shared variables, you
need to make sure you're releasing the locks appropriately too.</p>
<p>Thankfully someone wrote a swap-in replacement for the Go <code>sync</code>
package called
<a href="https://github.com/sasha-s/go-deadlock">go-deadlock</a>. When you import
this package instead of the default <code>sync</code> package, it will panic and
give you a stacktrace when it thinks you hit a deadlock.</p>
<p>Sometimes it thinks you hit a deadlock because a method that needs a
lock takes too long. Sometimes that time it takes is legitimate (or
something you haven't optimized yet). But actually its default of
<code>30s</code> is not really aggressive at all.</p>
<p>So I normally set the deadlock timeout to <code>2s</code> and eventually would
like to make that more like <code>100ms</code>:</p>
<div class="highlight"><pre><span></span>sync.Opts.DeadlockTimeout = 2000 * time.Millisecond
</pre></div>
<p>It's mostly the <code>persist()</code> function that causes <code>go-deadlock</code> to
think there's a deadlock because it tries to synchronously write a
bunch of data to disk.</p>
<h5 id="<code>go-deadlock</code>-is-slow"><code>go-deadlock</code> is slow</h5><p>The <code>go-deadlock</code> package is incredibly useful. But don't forget to
turn it off for benchmarks. With it on I get around 4-8k
entries/second. With it off I get around 20k-40k entries/second.</p>
<h4 id="unbounded-memory">Unbounded memory</h4><p>Another issue in this implementation is that the log keeps growing
indefinitely <em>and</em> the entire log is duplicated in memory.</p>
<p>There are two ways to deal with that:</p>
<ol>
<li>Implement Raft snapshotting so the log can be truncated safely.</li>
<li>Keep only some number of entries in memory (say, 1 million) and
read from disk as needed when logs need to be verified. In ideal
operation this would never happen since ideally all servers are
always on, never miss entries, and just keep appending. But "ideal"
won't always happen.</li>
</ol>
<p>Similarly, there is unbounded and unreused channel creation for
notifying <code>s.Apply()</code> when the user-submitted message(s) finish.</p>
<h4 id="net/rpc-and-encoding/gob">net/rpc and encoding/gob</h4><p>In the <code>persist()</code> section above I already mentioned how I prototyped
this using Go's builtin gob encoding. And I mentioned how inefficient
it was. It's also pretty slow and I learned that because <code>net/rpc</code>
uses it and after everything I did <code>net/rpc</code> started to be the
bottleneck in my benchmarks. This isn't incredibly surprising.</p>
<p>So a future version of this code might implement its own protocol and
own encoding (like we did for disk) on top of TCP rather than use
<code>net/rpc</code>.</p>
<h4 id="jepsen">Jepsen</h4><p>Everyone wants to know how a distributed algorithm does against
<a href="https://github.com/jepsen-io/jepsen">Jepsen</a>, which tests
linearizability of distributed systems in the face of network and
process faults.</p>
<p>But the setup is not trivial so I haven't hooked it up to this project
yet. This would be a good area for future work.</p>
<h4 id="election-timeout-and-the-environment">Election timeout and the environment</h4><p>One thing I noticed as I was trying out alternatives to <code>net/rpc</code>
(alternatives that injected latency to simulate a bad environment) is
that election timeouts should probably be tuned with latency of the
cluster in mind.</p>
<p>If the election timeout is every <code>300ms</code> but the latency of the
cluster is near <code>1s</code>, you're going to have non-stop leader election.</p>
<p>When I adjusted the election timeout to be every <code>2s</code> when the latency
of the cluster is near <code>1s</code>, everything was fine. Maybe this means
there's a bug in my code but I don't think so.</p>
<h4 id="client-request-serial-identifier">Client request serial identifier</h4><p>One major part of the Raft protocol I did not cover is that the client
is supposed to send a serial identifier for each message sent to the
cluster. This is to ensure that messages are not accidentally
duplicated at any level of the entire software stack.</p>
<p><a href="https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf">Diego Ongaro's
thesis</a>
goes into more detail about this than the Raft paper. Search in that
PDF for "session".</p>
<p>Again I just completely ignored the possibility of duplicate messages
in this implementation so far.</p>
<h3 id="references">References</h3><p>Finally, I could not have done this without a bunch of internet
help. This project took me about 7 months in total. The first 5 months
I was trying to figure it out mostly on my own, just looking at the
Raft paper.</p>
<p>The biggest breakthrough came from discovering the author of Raft's
TLA+ spec for Raft. Formal methods sound scary but it was truly not
too bad! This was the first "implementation" of Raft that was in a
single file of code. And under 500 lines.</p>
<p>Jack Vanlightly's guide to reading TLA+ helped a bunch.</p>
<p>Finally, I had to peer at other implementations, especially to figure
out locking and avoiding deadlocks.</p>
<p>Here's everything that helped me out.</p>
<ul>
<li><a href="https://raft.github.io/raft.pdf">In Search of an Understandable Consensus Algorithm</a>: The Raft paper.</li>
<li><a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla">raft.tla</a>: Diego Ongaro's TLA+ spec for Raft.</li>
<li>Jon Gjengset's <a href="https://thesquareplanet.com/blog/students-guide-to-raft/">Students' Guide to Raft</a></li>
<li>Jack Vanlightly's <a href="https://medium.com/splunk-maas/detecting-bugs-in-data-infrastructure-using-formal-methods-704fde527c58">Detecting Bugs in Data Infrastructure using Formal Methods (TLA+ Series Part 1)</a>: An intro to TLA+.</li>
</ul>
<p>And useful implementations I looked at for inspiration and clarity.</p>
<ul>
<li>Hashicorp's <a href="https://github.com/hashicorp/raft">Raft implementation</a> in Go: Although it's often quite complicated to learn from since it actually is intended for production.</li>
<li>Eli Bendersky's <a href="https://github.com/eliben/raft">Raft implementation</a> in Go: Although I got confused following it since it used signed integers and <code>-1</code> to represent base cases. Signed integers is a fair choice as far as I can tell, I just wanted to only use unsigned integers.</li>
<li>Jing Yang's <a href="https://github.com/ditsing/ruaft">Raft implementation</a> in Rust: Although I find Rust hard to read.</li>
</ul>
<p>And I haven't tried these but they look cool:</p>
<ul>
<li><a href="https://jepsen.io/services#training">Raft course taught by Jepsen</a></li>
<li><a href="https://www.dabeaz.com/raft.html">Raft course taught by David Beazley</a></li>
</ul>
<p>Cheers!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote about implementing Raft in Go. By far the most challenging project I've worked on in spare time. About 7 months sporadically.<br><br>I'm not an expert, and this is not intended to be used in production. I wanted a better background on the subject!<a href="https://t.co/EhyBuQ4pD3">https://t.co/EhyBuQ4pD3</a> <a href="https://t.co/vGhBbV1shf">pic.twitter.com/vGhBbV1shf</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1661720451616210944?ref_src=twsrc%5Etfw">May 25, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-05-25-raft.htmlThu, 25 May 2023 00:00:00 +0000
- Two books I recommend to developershttp://notes.eatonphil.com/books-developers-should-read.html<p class="note">
Originally published on February 1, 2021. The original version
included two books I don't think are actually so worthwhile. This
list is down to two. I think that's a good thing actually.
</p><p>These are the books I recommend to developers wanting to improve their
skills as professional programmers because of high information
density, believable premises/examples, and being well edited.</p>
<p>You don't need to read books to improve as a developer but
they are unparalleled in quickly helping you gain depth in a subject.</p>
<h3 id="high-performance-browser-networking">High Performance Browser Networking</h3><p>If you deal with networks, you would probably benefit from this book.
It is a thorough high level introduction to mobile networks, browser
network protocols, and fundamentals of networking.</p>
<h3 id="designing-data-intensive-applications">Designing Data-Intensive Applications</h3><p>If you use a database (including an in-memory array of items you
search periodically) or if you build APIs, you would probably benefit
from this book. A solid introduction to distributed computing, data
transfer, indexing, etc.</p>
<h3 id="that's-it!">That's it!</h3><p>Generic software books conspicuously not on this list for
me:</p>
<ul>
<li>Clean Code</li>
<li>JavaScript the Good Parts</li>
<li>Design Patterns/Gang of Four</li>
<li>Structure and Interpretation of Computer Programs</li>
<li>A Philosophy of Software Design</li>
</ul>
<p>They're not all bad but give nowhere near as much return for the
investment of your time.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Four books I recommend to professional developers wanting to improve their craft, and a few I'd not<a href="https://t.co/1aTrfqZ9bd">https://t.co/1aTrfqZ9bd</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1356391931274756096?ref_src=twsrc%5Etfw">February 2, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/books-developers-should-read.htmlTue, 16 May 2023 00:00:00 +0000
- My favorite software subredditshttp://notes.eatonphil.com/high-quality-subreddits-you-should-be-following.html<p class="note">
Originally published on December 5, 2021.
</p><p>If you are an experienced software developer whose only exposure to
reddit is dank memes, <a href="https://reddit.com/r/programming">proggit</a> or even
language-specific subreddits like
<a href="https://reddit.com/r/python">/r/python</a>, you're missing out.</p>
<p>What follows are my favorite subreddits in tech. My criteria is that:</p>
<ul>
<li>The subreddit topic is relevant to advancing as a programmer</li>
<li>Posts generally go into good depth</li>
<li>The comments stay on topic</li>
<li>And the shit-posting is minimal</li>
</ul>
<p>This list isn't hard to guess at if you consider advanced topics
in software. But I wanted to share because I think it's worth
explicitly supporting high-quality subreddits.</p>
<ul>
<li><a href="https://www.reddit.com/r/EmuDev/">/r/EmuDev</a><ul>
<li>My favorite sub of all. Also has a <a href="https://www.reddit.com/r/EmuDev/comments/9mop2q/join_the_official_remudev_chat_on_discord/">phenomenal Discord group</a>.</li>
</ul>
</li>
<li><a href="https://www.reddit.com/r/programminglanguages">/r/ProgrammingLanguages</a><ul>
<li>Focuses a little more on PLT topics (parsing techniques, syntax, type systems) than on compiling and interpreting techniques, but still good.</li>
</ul>
</li>
<li><a href="https://www.reddit.com/r/DatabaseDevelopment/">/r/DatabaseDevelopment</a><ul>
<li>All about database internals, which ends up involving a bunch of
correctness and distributed systems stuff as well.</li>
<li>Disclosure: I run this sub. It's at 2.7k+ members at time of publishing.</li>
</ul>
</li>
<li><a href="https://www.reddit.com/r/ReverseEngineering/">/r/ReverseEngineering</a><ul>
<li>The largest subreddit on this list but still has pretty good posts.</li>
</ul>
</li>
<li><a href="https://www.reddit.com/r/esolangs/">/r/EsoLangs</a><ul>
<li>One of the best/most fun intros to programming languages/compilers/interpreters is through languages like Brainfuck. This sub does a good job of keeping the fun going.</li>
</ul>
</li>
<li><a href="https://www.reddit.com/r/Compilers/">/r/Compilers</a></li>
<li><a href="https://www.reddit.com/r/GraphicsProgramming/">/r/GraphicsProgramming</a></li>
</ul>
<p>While some language subreddits are pretty good, they are more so a
mixed bag than some of the topic-specific subreddits here. So they
don't make my list, more on principle than anything else.</p>
<p>If there is a good one already, send me it!</p>
<h3 id="what-am-i-missing?">What am I missing?</h3><p>Am I missing other amazing subreddits? Just don't say
language-specific ones. :)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">It's an incorrect meme IMO that tech Reddit is low-quality. You just have to find the interesting subreddits.<br><br>I've updated my list for 2023.<a href="https://t.co/OtM2tk8HOn">https://t.co/OtM2tk8HOn</a> <a href="https://t.co/ymyzChp0SO">pic.twitter.com/ymyzChp0SO</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1658567638090391555?ref_src=twsrc%5Etfw">May 16, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/high-quality-subreddits-you-should-be-following.htmlTue, 16 May 2023 00:00:00 +0000
- Errors and Zighttp://notes.eatonphil.com/errors-and-zig.html<p>At TigerBeetle these last few weeks I've been doing a mix of
documenting client libraries, writing sample code for client
libraries, and writing integration tests against the sample code.</p>
<p>The client library documentation is generated with a Zig script. The
sample code is integration tested with a Zig script. A bunch of Zig
scripts.</p>
<p>It's not the same
<a href="https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/TIGER_STYLE.md">rigorous</a>
sort of Zig as the main database. (We're generally more lax about
scripts and test code.)</p>
<p><em>And I'm specifically writing this post on my personal blog since my
script code is not under incredible scrutiny.</em></p>
<p>Furthermore, I'm still new to Zig. Since I'm still learning, there
have been a few things that tripped me up.</p>
<p>And now that I've written this out, I realize most of my stumbling was
related to errors.</p>
<h3 id="failure">Failure</h3><p>Lots of things in programs allocate memory. This sounds dumb and
obvious but before programming Zig I did not appreciate how many
operations I'm used to allocate memory. I've previously only
programmed in GC languages that do the allocations behind the scenes.</p>
<p>Furthermore, memory allocation can fail. Zig makes allocation failures
explicit. So lots of things in Zig code need to handle failure.</p>
<p>Selectively omitting error handling is not allowed:</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="k">fn</span><span class="w"> </span><span class="n">thing</span><span class="p">(</span><span class="n">a</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="p">}</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">();</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">thing</span><span class="p">(</span><span class="n">allocator</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Run <code>zig run test.zig</code>:</p>
<div class="highlight"><pre><span></span>test.zig:4:23:<span class="w"> </span>error:<span class="w"> </span>error<span class="w"> </span>is<span class="w"> </span>ignored
<span class="w"> </span>std.fmt.allocPrint<span class="o">(</span>a,<span class="w"> </span><span class="s2">""</span>,<span class="w"> </span>.<span class="o">{})</span><span class="p">;</span>
<span class="w"> </span>~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~
test.zig:4:23:<span class="w"> </span>note:<span class="w"> </span>consider<span class="w"> </span>using<span class="w"> </span><span class="s1">'try'</span>,<span class="w"> </span><span class="s1">'catch'</span>,<span class="w"> </span>or<span class="w"> </span><span class="s1">'if'</span>
referenced<span class="w"> </span>by:
<span class="w"> </span>main:<span class="w"> </span>test.zig:12:9
<span class="w"> </span>callMain:<span class="w"> </span>/home/phil/vendor/zig-linux-x86_64-0.11.0-dev.2213+515e1c93e/lib/std/start.zig:617:32
<span class="w"> </span>remaining<span class="w"> </span>reference<span class="w"> </span>traces<span class="w"> </span>hidden<span class="p">;</span><span class="w"> </span>use<span class="w"> </span><span class="s1">'-freference-trace'</span><span class="w"> </span>to<span class="w"> </span>see<span class="w"> </span>all<span class="w"> </span>reference<span class="w"> </span>traces
</pre></div>
<p>This ends up meaning lots of code like:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">do_stuff</span><span class="p">(</span>
<span class="w"> </span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="c1">// Let's assume this is an arena allocator so I don't care about freeing.</span>
<span class="w"> </span><span class="n">stuff</span><span class="o">:</span><span class="w"> </span><span class="n">Stuff</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">alloc</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="o">&</span><span class="p">[</span><span class="n">_</span><span class="p">][]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">{</span>
<span class="w"> </span><span class="s">"first of something"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"one more"</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">stuff</span><span class="p">.</span><span class="n">thing</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">"build some string {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">stuff</span><span class="p">.</span><span class="n">athing</span><span class="p">}));</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">other_stuff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">"things... {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">blah</span><span class="p">});</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">do_other_stuff</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">other_stuff</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>You have <code>try</code>-es all over the place.</p>
<h3 id="limits-of-<code>try</code>">Limits of <code>try</code></h3><p>Now I don't have a problem with acknowledging that allocations can
fail. At least outside of scripts. In scripts like I've been writing
though I don't really care.</p>
<p>Having all of those <code>try</code>-es is just extra typing all over the place.</p>
<p>It would be nice if I could have instead done:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">do_stuff</span><span class="p">(</span>
<span class="w"> </span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="c1">// Let's assume this is an arena allocator so I don't care about freeing.</span>
<span class="w"> </span><span class="n">stuff</span><span class="o">:</span><span class="w"> </span><span class="n">Stuff</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">alloc</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="o">&</span><span class="p">[</span><span class="n">_</span><span class="p">][]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">{</span>
<span class="w"> </span><span class="s">"first of something"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"one more"</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">stuff</span><span class="p">.</span><span class="n">thing</span><span class="p">);</span>
<span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">"build some string {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">stuff</span><span class="p">.</span><span class="n">athing</span><span class="p">}));</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">other_stuff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">"things... {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">blah</span><span class="p">});</span>
<span class="w"> </span><span class="n">do_other_stuff</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">other_stuff</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>But Zig's <code>try</code> doesn't work like that. I'm not sure why not. The
Zig developers are sensible so I'm sure there's a good reason.</p>
<p>Still, are there other options?</p>
<h3 id="<code>catch-unreachable</code>"><code>catch unreachable</code></h3><p>So the problem isn't just that you have to acknowledge memory
allocation failures but that these failures within every helper
function need to be acknowledged by the caller of the helper
function. Failures infiltrate the entire call tree.</p>
<p>Now of course these potential failures would exist whether or not Zig
exposed them. So I don't mean to say it's Zig's fault for exposing
them.</p>
<p>But you can avoid failure handling by instead of <code>try</code>-ing everything,
mark error conditions as <code>unreachable</code>.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">do_stuff</span><span class="p">(</span>
<span class="w"> </span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="c1">// Let's assume this is an arena allocator so I don't care about freeing.</span>
<span class="w"> </span><span class="n">stuff</span><span class="o">:</span><span class="w"> </span><span class="n">Stuff</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">alloc</span><span class="p">);</span>
<span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="o">&</span><span class="p">[</span><span class="n">_</span><span class="p">][]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">{</span>
<span class="w"> </span><span class="s">"first of something"</span><span class="p">,</span>
<span class="w"> </span><span class="s">"one more"</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">stuff</span><span class="p">.</span><span class="n">thing</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">"build some string {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">stuff</span><span class="p">.</span><span class="n">athing</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">other_stuff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">"things... {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">blah</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">do_other_stuff</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">other_stuff</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>As you can see from the function signature, this function no longer
returns any error at all. But it could possibly panic.</p>
<p>Now in scripts, for things like memory allocations that can fail, I
actually think it's reasonable to mark allocation failures as
unreachable.</p>
<p>But I took it a bit further. Using <code>@panic</code> or <code>unreachable</code> in
general failure conditions.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">run</span><span class="p">(</span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">cmds</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ChildProcess</span><span class="p">.</span><span class="n">exec</span><span class="p">(.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">argv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cmd</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">term</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">Exited</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">code</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">code</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">@panic</span><span class="p">(</span><span class="s">"Expected command to succeed."</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="handling-panics">Handling panics</h3><p>But there are some things that will fail quite frequently (like
running subprocesses or interacting with the filesystem in general).</p>
<p>Panicing (like what happens if <code>@panic()</code> <s>or `unreachable`</s> is hit) in
these situations is all good until you have things that you want to
get cleaned up.</p>
<p class="note">
My <a href="https://matklad.github.io/">coworker</a> points out I'm
wrongly conflating <code>unreachable</code>
and <code>@panic()</code> since depending on the release mode,
hitting <code>unreachable</code> is actually undefined behavior
whereas <code>@panic()</code> is always a panic.
</p><p>Panics don't trigger <code>defer</code> or <code>errdefer</code> statements. So if you have
a script that starts a background process or creates a temporary
directory, and if you panic in that script, the script won't be able
to run <code>defer</code> steps to stop the background process or delete the
temporary directory.</p>
<p>There are panic handlers in Zig (not yet documented, Ctrl-f for "TODO:
pub fn panic" in the <a href="https://ziglang.org/documentation/master/">Zig
docs</a>. But I'd just be
getting further from what seems sensible if I went in that direction.</p>
<h3 id="zig-errors">Zig errors</h3><p>So I stopped panic-ing everywhere and switched to using real Zig
errors, like:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">run</span><span class="p">(</span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">cmds</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ChildProcess</span><span class="p">.</span><span class="n">exec</span><span class="p">(.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">argv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cmd</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">term</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">Exited</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">code</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">code</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Expected command to succeed.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">RunCommandFailed</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>It's pretty sweet. You get to make up a new <code>error</code> enum wherever
you'd like.</p>
<p>It is unfortunate you can't (currently) include a payload with the
error return value. There's an <a href="https://github.com/ziglang/zig/issues/2647">active issue discussing
it</a>.</p>
<p>But so far I've been able to work around that, as seen in that example
above, by logging before returning an error. Since most of the time
the payload you want to return is detailed information to provide
context.</p>
<p>This logging is fine in a CLI application but probably not everything
you'd want in a library. I'm not sure.</p>
<p>And now without panics, functions that deal with <code>error</code> enums and
<code>try</code> work with <code>defer</code> and <code>errdefer</code> again! Cleanup of my
background processes and temporary directories happens like I want.</p>
<h3 id="handling-errors-with-<code>if</code>">Handling errors with <code>if</code></h3><p>Ok so now that I'm fully bought into Zig errors there were still a few
more things that tripped me up.</p>
<p>First is that you can handle errors a few ways. You already saw the
first one with <code>try</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">thingThatCouldFail</span><span class="p">();</span>
</pre></div>
<p>This will cause the function the statement is inside to short-circuit,
returning immediately, if <code>thingThatCouldFail</code> has an error result.</p>
<p>But then I wanted to retry a function that could fail in a loop after
handling the error.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// do something that should fix it for the next time</span>
<span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>But that isn't a real syntax. The Zig docs show an example of how you
can use <code>if</code> with an <code>error</code> function:</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">doAThing</span><span class="p">(</span><span class="n">str</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">number</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">doSomethingWithNumber</span><span class="p">(</span><span class="n">number</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">Overflow</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// handle overflow...</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="c1">// we promise that InvalidChar won't happen (or crash in debug mode if it does)</span>
<span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">InvalidChar</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>But I don't care about the error at this moment (maybe I should, but I
don't right now).</p>
<p>So I tried:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// do something that should fix it for the next time</span>
<span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>But that gives me an obscure type error.</p>
<p>I was stumped here for a while until I decided to try the whole syntax
in that example. And it turns out that at least the capture part is
necessary at the parser layer:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// do something that should fix it for the next time</span>
<span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And eventually I guessed an unnamed error variable might also work
without the switch, and that was correct:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">|</span><span class="n">_</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// do something that should fix it for the next time</span>
<span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Nice!</p>
<h3 id="<code>catch</code>-blocks"><code>catch</code> blocks</h3><p>One last thing that I was stumbling around with was that when you use
<code>catch</code> with a function that returns an error or some non-void value,
the catch must "return" a value of the same type as the function.</p>
<p>The Zig docs show a simple example:</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="mi">13</span><span class="p">;</span>
</pre></div>
<p>But I also use <code>catch</code> with blocks sometimes:</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// do some more complex stuff, maybe log, who knows</span>
<span class="p">};</span>
</pre></div>
<p>But that won't compile. So the "trick" is to combine Zig's <a href="https://ziglang.org/documentation/master/#Blocks">named
blocks</a> with
<code>catch</code>.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="n">blk</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// do some more complex stuff, maybe log, who knows</span>
<span class="w"> </span><span class="c1">// and then "return" a result</span>
<span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="o">:</span><span class="n">blk</span><span class="w"> </span><span class="mi">13</span><span class="p">;</span>
<span class="p">};</span>
</pre></div>
<h3 id="contributing-to-zig-docs">Contributing to Zig docs</h3><p>I didn't want to write this post without offering some of my examples
to the docs. While there's a dedicated effort around autodoc, the tool
that builds docs for the standard library, I haven't yet stumbled on
docs for contributing the main Zig docs.</p>
<p>So I grepped in the Zig repo <code>git grep 'Blocks are expressions.'</code>, a
phrase that showed up in the HTML docs, and found
<code>doc/langref.html.in</code>.</p>
<p>Then someone on the <a href="https://discord.gg/gxsFFjE">Zig Programming Language
Discord</a> pointed me at running
<code>zig build docs</code> in the repo root to generate the HTML.</p>
<p>And now I've got a <a href="https://github.com/ziglang/zig/pull/15042">PR up</a>!
We'll see what folks think.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post about error-handling and Zig, as I've been doing a bunch of scripting with Zig recently.<br><br>I stumbled a few times so maybe that will be useful to you. And I was able to turn parts of my stumbling into a potential PR to the Zig docs. 🎉<a href="https://t.co/00RVWpodmd">https://t.co/00RVWpodmd</a> <a href="https://t.co/wENSEpj63A">pic.twitter.com/wENSEpj63A</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1638350047887622145?ref_src=twsrc%5Etfw">March 22, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/errors-and-zig.htmlTue, 21 Mar 2023 00:00:00 +0000
- Notes from Neal Gabler's Walt Disneyhttp://notes.eatonphil.com/2023-02-18-neal-gabler-walt-disney-notes.html<p>Disney was a celebrity by his mid-30s, Disney the company was famous
by 1930s.</p>
<p>Even though politically the 1930s was considered the decade of
Roosevelt (elected President in 1933), culturally the 1930s was
considered the decade of Mickey Mouse.</p>
<p>Almost every new animation/filmmaking technique they tried, they would
experiment with it in shorts (Silly Symphonies) before applying to big
films like Snow White. Examples of this include:</p>
<ul>
<li>Multiple layers of animation moving independently to create depth in
<a href="https://www.youtube.com/watch?v=MYEmL0d0lZE">The Old Mill</a></li>
<li>The first Disney animations with humans (not flora/fauna) like <a href="https://www.youtube.com/watch?v=SRB2YlQOSBI">The
Cookie Carnival</a></li>
</ul>
<p>Nobody took animation seriously, didn't think there was much
possibility for it in film. Disney kept pushing the envelope. Some
examples include:</p>
<ul>
<li>Not including the hand inside the drawing (<a href="https://www.youtube.com/watch?v=ERokauUI6TA">though early Disney ones
did</a>)</li>
<li>Eventually focusing on actual stories, not just gags/jokes</li>
<li>Sound (the famous Mickey <a href="https://www.youtube.com/watch?v=BBgghnQF6E4">Steamboat Willie
animation</a>, <a href="https://www.loc.gov/static/programs/national-film-preservation-board/documents/steamboat_willie.pdf">read
more</a>)</li>
<li>Merchandise, not just the art</li>
<li><a href="https://www.rarenewspapers.com/view/557744">Feature films (i.e. Snow White in 1937, the first animated feature
film), not just shorts</a></li>
<li>Brought Hollywood to Television<ul>
<li>"Walt Disney signed an exclusive long-term contract today with
the American Broadcasting Company to become the first leading
Hollywood producer to enter into formal alliance with
television. <a href="https://timesmachine.nytimes.com/timesmachine/1954/04/03/84611681.html?pageNumber=19">NY
Times</a></li>
</ul>
</li>
<li>Added <a href="https://d23.com/the-wonderful-things-about-walt-disneys-wonderful-world-of-color/">color to TV
shows</a></li>
</ul>
<h3 id="snow-white">Snow White</h3><p>Disney hired fine arts teachers to come and teach employees. From time
to time he forced the artists to take night classes.</p>
<p>They trained for years(?) before <em>starting</em> the animation of Snow
White and did almost all the animation in the last 10 months or so
before the release in December 1937.</p>
<p>They had to do 24-hour animation in 8 hour shifts to get up to
speed. They had to hire 100s of animators to do fill in work so the
“master” animators could focus on “drawing the extremes”.</p>
<p>The average age at Disney was 25. These days of the 1930s really felt
quite similar to what a Silicon Valley startup is thought to be.</p>
<p>Disney preferred to hire recent art school students so they could
train them in the Disney style.</p>
<p>They could not animate humans during Snow White well enough so they
ended up just tracing them, called
<a href="https://imgur.com/gallery/IZkSR">rotoscoping</a>.</p>
<p>The Snow White voice cast were quite famous at the time. We wouldn't
know it now but it was basically an ensemble cast.</p>
<h3 id="world-war-2">World War 2</h3><p>Ran low on money so they produced films for the <a href="https://en.wikipedia.org/wiki/List_of_Walt_Disney%27s_World_War_II_productions_for_Armed_Forces">US
Government</a>. <a href="https://www.smithsonianmag.com/history/how-disney-propaganda-shaped-life-on-the-home-front-during-wwii-180979057/">Propaganda</a>,
basically. But also <a href="https://www.youtube.com/watch?v=kRVFQs2XYy4">instructional
videos</a>.</p>
<p><a href="https://animationguild.org/about-the-guild/disney-strike-1941/">Disney workers began striking
(1941)</a>
and established unions. If Disney was a dick before this, he became a
much bigger dick after this.</p>
<h3 id="post-war">Post War</h3><p>Got into television with ABC initially. First Hollywood company to do
so. Arrangement with ABC was in part to finance Disneyland. (Not
covered in the book but Disney <a href="https://www.nytimes.com/1995/08/01/business/media-business-merger-walt-disney-acquire-abc-19-billion-deal-build-giant-for.html">eventually took over
ABC</a>,
not before eventually splitting ABC and working with NBC though.)</p>
<p>Disney stopped caring about films and moved to mostly thinking about
Disneyland, this under WED (what is now Walt Disney Imagineering).</p>
<p>After Disneyland launched he moved on to world fairs and eventually
Disneyworld. He died of lung cancer before completing Disneyworld.</p>
<h3 id="tidbits">Tidbits</h3><ul>
<li><a href="https://www.disneyplus.com/video/aa400cf1-a54d-4187-997d-573711c88697">The Reluctant
Dragon</a>,
a throwaway film because they needed money when they went public. It
is the story of a children's book author trying to get Disney to
make a film out of his book. He stumbles around the new Disney
Burbank Studio through art classes and musicians practicing,
uncovering how Disney films are made in the process.</li>
</ul>
<h3 id="questions">Questions</h3><ul>
<li>What were the other major animation studies? Even if Snow White was
the first animated feature film, surely others must have rushed to
copy the success. Who were they?<ul>
<li>UPA (Mr Magoo) was one. Also Warner Brothers</li>
</ul>
</li>
</ul>
<h3 id="conclusion">Conclusion</h3><p>Basically after every turn he'd get tired of the stuff he had already
done (and killed at doing) to do something new. From animated shorts
to feature films to television to Disneyland to Disneyworld and EPCOT.</p>
<p>To his employees he was a huge dick. They'd be in constant fear of
upsetting him and getting fired. And he admitted that he would
basically fire people randomly. He'd fire anyone important enough to
get their name on a door (i.e. establish their own fiefdom within the
company). But it seems more like Disney the company worked in spite of
this rather than because of this.</p>
<p><strong>After Mary Poppins (1964, two years before he died): "I'm on the
spot. I have to keep trying to keep up to that same level. And the way
to do it is not to worry, not to get tense. Not to think, 'I got to
beat Mary Poppins', 'I got to beat Mary Poppins'. The way to do it is
just to go off and get interested in some little thing, some little
idea that interests me. Some little idea that looks like fun."</strong></p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Finished Neal Gabler's Walt Disney (5/5) and here are my raw notes. (If I had to polish the notes I wouldn't have the will to publish.) Hopefully a few interesting bits and links in there though.<br><br>In particularly this quote (2nd pic) really struck me.<a href="https://t.co/P9astFZ6Ts">https://t.co/P9astFZ6Ts</a> <a href="https://t.co/wKPd6zjLau">pic.twitter.com/wKPd6zjLau</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1627025676162281472?ref_src=twsrc%5Etfw">February 18, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-02-18-neal-gabler-walt-disney-notes.htmlSat, 18 Feb 2023 00:00:00 +0000
- Lessons learned streaming building a Scheme-like interpreter in Gohttp://notes.eatonphil.com/2023-01-30-livescheme.html<p>I wanted to practice making coding videos so I did a <a href="https://www.youtube.com/watch?v=lZNhZI-dN9k&list=PLjJMyANAIVHEgUOK2cU0hrvSwFPNHT2a7">four-part
series</a>
on writing a basic Scheme-like language (minus macros and arrays and
tons of stuff).</p>
<p>I picked this simple topic because I wanted a low-stakes way to learn
what I did not know about making videos.</p>
<p>Here was the end result (nothing crazy):</p>
<div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">go</span><span class="w"> </span><span class="nv">build</span>
<span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">examples/fib</span><span class="o">.</span><span class="nv">scm</span>
<span class="p">(</span><span class="nf">func</span><span class="w"> </span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nf">a</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb"><</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nv">a</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="p">(</span><span class="nf">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">(</span><span class="nf">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">)))))</span>
<span class="p">(</span><span class="nf">fib</span><span class="w"> </span><span class="mi">11</span><span class="p">)</span>
<span class="nv">$</span><span class="w"> </span><span class="o">.</span><span class="nv">/livescheme</span><span class="w"> </span><span class="nv">examples/fib</span><span class="o">.</span><span class="nv">scm</span>
<span class="mi">89</span>
</pre></div>
<p>The code for the project is
<a href="https://github.com/eatonphil/livescheme">here</a>.</p>
<h3 id="video-archives">Video archives</h3><p>Here are the four episodes! Each about an hour long. One per week for
four weeks.</p>
<ul>
<li><a href="https://www.youtube.com/watch?v=lZNhZI-dN9k">Part 1: A lexer</a></li>
<li><a href="https://www.youtube.com/watch?v=5ttFEPQopXc">Part 2: Parsing</a></li>
<li><a href="https://www.youtube.com/watch?v=YwmGcverSHI">Part 3: AST walking interpreter</a></li>
<li><a href="https://www.youtube.com/watch?v=skDhTWILH8I">Part 4: Cleanup and Fibonacci</a></li>
</ul>
<h3 id="live-live">Live live</h3><p>The videos were <a href="https://twitch.tv/eatonphil">streamed to Twitch</a>
live.</p>
<p>I didn't prep for them because I wanted to show warts and all. The
thought process.</p>
<p>But some things turned out to be tricky to explain without preparation
(function calling conventions, mostly).</p>
<p>Overall hopefully the series was somewhat useful.</p>
<h3 id="full-screen-windows">Full screen windows</h3><p>The first episode I did I didn't make sure that the terminal window
was captured full screen. So some of my code went off the bottom of
the video. That was dumb.</p>
<p>I even have a tmux mode-line at the bottom of the terminal app that I
could have looked for to notice it didn't exist in the OBS view.</p>
<p>So I made sure to have the full window in view after the first
episode.</p>
<h3 id="twitch-moderation">Twitch moderation</h3><p><a href="https://safety.twitch.tv/s/article/Protect-your-channel-with-Shield-Mode">Twitch Shield
Mode</a>
is great. But the default setting prevents folks from commenting live
until they've followed you for 2 weeks or something.</p>
<p>For someone starting a channel that doesn't make much sense. So in my
first video I disabled it so folks could chat. And then some crypto
scammer came in. Go figure.</p>
<p>After the first video I turned Shield Mode back on but set the minimum
follow time to 10 minutes I think.</p>
<h3 id="obs-studio">OBS Studio</h3><p>I used <a href="https://obsproject.com/">OBS Studio</a> to record. I was
frustrated with it for a while because the video would lag so much
when I tested out streaming. After playing around with Twitch Studio
and giving up on it for being too simple, I messed with OBS video
settings enough to get my video to not lag. Unfortunately I can't
remember what settings I used.</p>
<h3 id="noise-gate-/-pop-filter">Noise Gate / pop filter</h3><p>The <a href="https://obsproject.com/kb/noise-gate-filter">Noise Gate Filter</a>
is awesome. My mechanical keyboard sounded obnoxious before I turned
it on. I was considering getting a pop filter but then discovered that
the Noise Gate Filter is built in, you just have to turn it on.</p>
<h3 id="scenes">Scenes</h3><p>It also took me a while to understand OBS Scenes but then I realized I
can use them to have an intro graphic (without the mic on!), a main
coding scene (focused on my terminal and with my webcam overlayed),
and a "back soon" graphic if I needed it.</p>
<p>To get the mic off you have to <a href="https://obsproject.com/forum/threads/mute-one-specific-scene.43661/">disable the mic
globally</a>
(it's on globally by default) and then add it as an input only to the
scenes you want.</p>
<h3 id="storage-and-export-to-youtube">Storage and export to YouTube</h3><p>Twitch doesn't store streams by default. You have to turn on <a href="https://help.twitch.tv/s/article/video-on-demand?language=en_US">Video on
Demand</a>.</p>
<p>Even when it's turned on the videos only seem to be stored for 1 week. Maybe that's configurable but I didn't see it.</p>
<p>In any case it's not a problem because you can set up a YouTube
connection. Then after a stream is complete you find the stream video
and click Export. It takes about a minute to upload the hour long
videos I did. Though YouTube post-processing took a while longer after
that.</p>
<h3 id="next?">Next?</h3><p>I'm forced to take a break from recording these videos for the next
two weeks since I'll be <a href="https://systemsdistributed.com/">in Cape
Town</a>.</p>
<p>I haven't decided yet if I'll continue this series (not something I'm
extremely excited about since everyone builds a Scheme-like language).</p>
<p>I'd like to have a project that I can keep contributing to over time
but I don't see very much value in doing that based on a Scheme or any
lisp-like.</p>
<p>Maybe I'll do a basic JavaScript implementation next. Or another basic
SQL database. Dunno.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Now that I'm done that series on the Scheme-like interpreter in Go (at least for a few weeks), I wrote down a few thoughts about the experience and the Twitch and OBS Studio setup.<br><br>Up next after Cape Town? Not totally sure yet!<a href="https://t.co/bgdO1ZI5Ow">https://t.co/bgdO1ZI5Ow</a> <a href="https://t.co/E1kwMRcCWY">pic.twitter.com/E1kwMRcCWY</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1620239367037157376?ref_src=twsrc%5Etfw">January 31, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/2023-01-30-livescheme.htmlMon, 30 Jan 2023 00:00:00 +0000
- An effective product managerhttp://notes.eatonphil.com/effective-product-manager.html<p>There are three specific activities I have loved in some product
managers I've worked with (and missed in others).</p>
<p>tldr;</p>
<ul>
<li>Talk with customers and prospects</li>
<li>Develop and share a vision</li>
<li>Evangelize</li>
</ul>
<h3 id="talk-with-customers-and-prospects">Talk with customers and prospects</h3><p>As a product manager, your superpower over engineering is to have
spent time with customers and prospects. You should have (or develop)
a good understanding of the market and your product's potential.</p>
<p>The only way you can do this is by spending time, over time, with
customers and prospects. Understanding their workflows and their
issues.</p>
<h3 id="develop-and-share-a-vision">Develop and share a vision</h3><p>Cynical folks will cringe at the word "vision" but it is a serious and
necessary part of a successful organization.</p>
<p>As a product manager, you should establish and share a path for
engineering to follow based on your understanding of customers,
prospects, the market, and the company.</p>
<p>This is the "roadmap" and "prioritization". But prioritization is
useless without a long-term vision.</p>
<p>The roadmap should represent (and broadly demonstrate) a concrete and
meaningful goal. A goal that you can and should adjust over time as
the company and market changes.</p>
<h3 id="evangelize">Evangelize</h3><p>In bigger organizations there might be dedicated evangelism teams. But
product managers must drive this work.</p>
<p>Evangelism should fit the vision you've developed.</p>
<p>And in the absense of dedicated evangelism teams, product managers
should be creating demos, writing blog posts, and testing the solution
with customers and prospects.</p>
<p>Again, it's fine for dedicated teams outside of product management to
do bits of that work. But it must be driven and led by the product
manager.</p>
<h3 id="it's-hard">It's hard</h3><p>Observed as I have from outside, being an effective product
manager feels like a massively challenging task.</p>
<p>It's so easy to go without talking to customers, to get sucked into
day-to-day issues and not create a vision, and to allow evangelism to
happen ad-hoc.</p>
<p>Then there's the fact you don't live in a vacuum. You may have a boss
in product management. Your engineering peers may have competing
priorities. You may have a hard time understanding the founders or
CEO. In a large company, you may not even have a CEO.</p>
<h3 id="my-ideas,-your-ideas">My ideas, your ideas</h3><p>These are my ideas based on my
<a href="https://eatonphil.com/">experience</a>. You may have your own ideas. If
mine help you, great! If they don't, great!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I've been considering recently what makes an effective product manager. So I wrote down a few of my thoughts.<br><br>What I've loved the most in some PMs and missed the most in others.<br><br>I'd likewise love to hear what you think!<a href="https://t.co/5vTWTNhs68">https://t.co/5vTWTNhs68</a> <a href="https://t.co/vXjPY9fiVT">pic.twitter.com/vXjPY9fiVT</a></p>— Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1617661616593723394?ref_src=twsrc%5Etfw">January 23, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/effective-product-manager.htmlMon, 23 Jan 2023 00:00:00 +0000
- The year in books: 2022http://notes.eatonphil.com/2023-01-12-year-in-books.html<p>In 2022 I <a href="https://www.goodreads.com/challenges/11636-2022-reading-challenge">finished 20
books</a>
spanning 15,801 pages. 3 more than I read in 2021, but about twice
the number of pages. 3 fiction and 17 non-fiction. Another ~30 started
but not finished.</p>
<p>I had a hard time reading books while I was trying to start my own
company. But I also discovered audiobooks. I would put on a book and
listen while I did my chores. Only 5 of the 20 books I finished were
physical (or kindle) books. The other 15 were audiobooks.</p>
<h3 id="non-fiction:-13-to-recommend">Non-fiction: 13 to recommend</h3><p>After I started read Robert Caro's Master of the Senate I got hooked
on history and felt less daunted about larger books.</p>
<p>The only non-fiction I read in 2022 was US and UK history.</p>
<p>Here were my favorites:</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/86525.Master_of_the_Senate">Master of the
Senate</a>
by Robert Caro: Covering more than just Lyndon B. Johnson but the
history of the Senate and the Civil Rights movements in the US. This
book is now on my <a href="https://lists.eatonphil.com/book-recommendations.html">list of best
books</a>.</li>
<li><a href="https://www.goodreads.com/book/show/19809.The_Last_Lion">The Last Lion: Winston Spencer Churchill: Visions of Glory,
1874-1932</a>
by William Manchester: First in a three-volume series about
Churchill. He's an especially interesting guy to read about because
he served in UK politics 1901 to his retirement (for the second
time) as UK Prime Minister in 1955. He was First Lord of the
Admiralty in World War 1 before he, more famously, become Prime
Minister during World War 2. This entire series is on my <a href="https://lists.eatonphil.com/book-recommendations.html">list of best
books</a>.</li>
<li><a href="https://www.goodreads.com/book/show/42547.The_Autobiography_of_Martin_Luther_King_Jr_">The Autobiography of Martin Luther King,
Jr.</a>:
Sad and revealing. Though it doesn't talk much about his legacy
since it only includes his writings.</li>
<li><a href="https://www.goodreads.com/book/show/13049569-the-passage-of-power">Passage of
Power</a>
by Robert Caro: Covering LBJ's pathetic failed attempts at the
presidency before becoming JFK's Vice President, up to JFK's
assassination. Still a very good book. I can't wait for Caro's final
book to come out.</li>
<li><a href="https://www.goodreads.com/book/show/2279.Truman">Truman</a> by David
McCullough: I always thought Truman was a lame nerd but he actually
had a very interesting life (and as I'd later discover, is far from
the lamest president. Wilson hands down takes that place.) And
unlike most other famous politicians I read about, he had a great
relationship with his wife. He was honest and respectable and was
the first US president to normalize relations with Mexico since the
Mexican-American War (that U.S. Grant and Robert E. Lee fought in
the 1840s).</li>
<li><a href="https://www.goodreads.com/book/show/55751.The_Last_Lion">The Last Lion: Winston Spencer Churchill: Alone,
1932-40</a> by
William Manchester: The second book in the series. Pretty
depressing because it's a decade of Churchill noticing Nazi German
behavior and stressing UK preparedness and the UK ignoring him and
Nazi Germany.</li>
<li><a href="https://www.goodreads.com/book/show/746673.The_Last_Lion">The Last Lion: Winston Spencer Churchill: Defender of the Realm,
1940-1965</a>
by William Manchester: The final book in the series, covering his
Prime Ministership.</li>
<li><a href="https://www.goodreads.com/book/show/884536.Eleanor_Roosevelt_Volume_1">Eleanor Roosevelt, Volume 1: The Early Years,
1884-1933</a>
by Blanche Wiesen Cook: Her background and many problems, as the
daughter of Theodore Roosevelt's brother and later husband of their
distant cousin, is pretty hard to relate to. Still it was quite
interesting to hear about her life and early activities how she
became such an outspoken progressive activist from being quite
conservative.</li>
<li><a href="https://www.goodreads.com/book/show/17082810-abraham-lincoln">Abraham Lincoln: A Life, Volume One</a> by Michael Burlingame</li>
<li><a href="https://www.goodreads.com/book/show/17082819-abraham-lincoln">Abraham Lincoln: A Life, Volume Two</a> by Michael Burlingame</li>
<li><a href="https://www.goodreads.com/book/show/34237826-grant">Grant</a> by Ron
Chernow: Among famous generals of the Civil War, somehow Robert
E. Lee and Stonewall Jackson came to mind to me more readily than
Grant. I'm glad I read this book because the popularity of Southern
generals today seems like revisionism. This book makes strong
arguments that while Lee was a great officer, he could only think in
terms of short-term tactics and the Virginia region. Whereas Grant
was the first (US, anyway) officer to consider and command (via
telegraph) all theaters of war at once, every day. And this book
redeems his presidency somewhat. His progressive adoption of freed
Black people and work to make them equal citizens is highly
commendable. Even with the horror of what happened in the South
after the war ended.</li>
<li><a href="https://www.goodreads.com/book/show/40929.The_Rise_of_Theodore_Roosevelt">The Rise of Theodore
Roosevelt</a>
by Edmund Morris: First in a three-volume series about the 26th
President. I read somewhere that it can feel impossible to read a
bad biography of Roosevelt because he was such an interesting
human. That may be true. This book didn't disappoint. Roosevelt
growing up in a townhouse in Manhattan, going to Harvard, buying a
farm on Long Island is all hard to relate to. His Puritanical morals
and machismo were also difficult to get past. But he was a very
interesting guy.</li>
<li><a href="https://www.goodreads.com/book/show/40923.Theodore_Rex">Theodore
Rex</a> by
Edmund Morris: Second in the series, covering the entirety of
Roosevelt's presidency. Like the first volume, a great read. I
always used to think Roosevelt was a pure war-monger. But he helped
avert war with the UK and Germany over Venezuelan debt-default. And
he later received the Nobel Peace Prize for mediating peace between
Japan and Russia in 1905.</li>
</ul>
<h3 id="fiction:-1-to-recommend">Fiction: 1 to recommend</h3><p>Of the three I read last year, I really enjoyed one:</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/18950097-the-leopard">The
Leopard</a>
by Giuseppe Tomasi di Lampedusa: A gentle piece of historical
fiction set during the 1860s in Sicily during and after the
unification of Italy. I learned about this book from a Rick Stein
episode in the Mediterranean Escapes series.</li>
</ul>
http://notes.eatonphil.com/2023-01-12-year-in-books.htmlThu, 12 Jan 2023 00:00:00 +0000
- Favorite compiler and interpreter resourceshttp://notes.eatonphil.com/2023-01-04-compiler-resources.html<head>
<meta http-equiv="refresh" content="4;URL='https://lists.eatonphil.com/compilers-and-interpreters.html'" />
</head><p>This is an external post of mine. Click
<a href="https://lists.eatonphil.com/compilers-and-interpreters.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2023-01-04-compiler-resources.htmlThu, 05 Jan 2023 00:00:00 +0000
- General book recommendationshttp://notes.eatonphil.com/2023-01-04-book-recommendations.html<head>
<meta http-equiv="refresh" content="4;URL='https://lists.eatonphil.com/book-recommendations.html'" />
</head><p>This is an external post of mine. Click
<a href="https://lists.eatonphil.com/book-recommendations.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2023-01-04-book-recommendations.htmlWed, 04 Jan 2023 00:00:00 +0000
- In response to a frontend developer asking about database developmenthttp://notes.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html<head>
<meta http-equiv="refresh" content="4;URL='https://letters.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html'" />
</head><p>This is an external post of mine. Click
<a href="https://letters.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.htmlSun, 01 Jan 2023 00:00:00 +0000
- Is it worth writing about?http://notes.eatonphil.com/is-it-worth-writing-about.html<p>You acquire a skill or experience through time and effort,
then downplay the impact of writing and sharing the learning
process.</p>
<p>Professionals seem naturally to imagine a high bar for what
is worth writing about.</p>
<p>I think that's misguided. This article is not criticism of folks with
these beliefs, but rather encouragement for folks looking for a reason
to write.</p>
<p>There are (at least) a few concrete reasons to write about what you've
learned, even when you don't think it's novel.</p>
<h3 id="to-practice-writing">To practice writing</h3><p>This is the easiest reason. While practice does not imply improvement,
you cannot improve without practice.</p>
<p>Every time you learn something is a chance to write down both what
you've learned and also how you learned it.</p>
<p>For professional developers this chance happens all the time. Daily,
really. But most developers, even those who want to write more, let
the opportunity slip.</p>
<h3 id="providing-variety">Providing variety</h3><p>When I learn a topic I normally go through dozens of posts, papers,
docs, videos or books to find a version that clicks. If I can. I
prefer to start with blog posts and often there are not blog posts on
the subject. Books, docs, videos, and academic papers aren't often as
accessible.</p>
<p>Even if you're writing about a popular topic, there's still a chance
your post gets through to someone in a way other posts do not.</p>
<p class="note">
For programmers there are notorious topics you can avoid if
you'd like ("What is a monad", "Why is lisp interesting", "Kubernetes
sucks"). Or not. I've fallen into those traps.
</p><p>Additionally, as you gain experience as a programmer (or product
manager, or whatever), your perspective and approach becomes both more
interesting and more valuable.</p>
<p>I don't recall ever thinking: "I wish they'd write less". But I'm
always wishing some folks wrote more, or at all.</p>
<p>Some folks with experience, writing about widely varied topics in
software include:</p>
<ul>
<li><a href="https://eli.thegreenplace.net/">Eli Bendersky</a></li>
<li><a href="https://nullprogram.com/blog/2015/03/19/">Chris Wellons</a></li>
<li>And <a href="https://zserge.com/">Serge Zaitsev</a></li>
</ul>
<p>But experience need not be a prerequisite. Experts (who don't practice
explaining) easily forget how they came to their current
understanding. A beginner's experience is valuable for everyone who is
not a beginner, sometimes also for beginners.</p>
<h3 id="to-cement-understanding">To cement understanding</h3><p>Finally, honest writing <em>forces</em> you to either understand the dark
corners of what you've learned or to ask for help in these dark
corners.</p>
<p>I have repeatedly wrestled with topics in software only to be further
forced to explain <em>why</em> (or <em>how</em>) when I write.</p>
<p>And it has often forced me to restructure code or ideas in ways that
are easier to explain. I think that's a pretty valuable act for the
long-term.</p>
<h3 id="bad-faith">Bad faith</h3><p>There's a bad faith argument that you sometimes see. Here's a
variation that comes to mind.</p>
<blockquote><p>The internet is already full of crap. People who aren't experts are
just making it worse.</p>
</blockquote>
<p>I hope you ignore these comments. :) If there's a quality problem that
is genuinely causing harm, that's for search engines and trade
organizations to deal with.</p>
<h3 id="in-the-extreme">In the extreme</h3><p><a href="https://til.simonwillison.net/">Simon Willison's TIL</a> site is the
most prolific version of this I've ever seen. I don't know if I
personally aspire to Simon's level, but I think it's worth seeing.</p>
<h3 id="topics">Topics</h3><p>Some topics I think are always worth writing about and sharing:</p>
<ul>
<li>Your process, failures and successes, to figuring something out</li>
<li>How to hack on some major open source project</li>
<li>In-depth comparison of projects or approaches, down to source code, benchmarks, and architecture when relevant</li>
<li>Building minimal versions of some production system</li>
<li>How some major systems works under the hood, down to the code</li>
<li>Mistakes you made in structuring organizations, or production architecture, or testing, etc.</li>
<li>How to get the dang configuration right for testing Electron apps in Github Actions</li>
</ul>
<p>For programming posts specifically: I strongly encourage you to
include or walk through working code. Have tests. And have the code
build process hooked up to GitHub Actions or SourceHut CI or
whatever. This helps ensure your work is still relevant over time.</p>
<h3 id="when-you-write">When you write</h3><p>Write to explain and teach. When you don't understand something, call
out that you don't understand it. That's not a bad thing, and the
internet is normally happy to help.</p>
<p>Don't shy away from showing code, showing things that broke, showing
the ugly process. It's encouraging for others to see.</p>
<h3 id="end-goal">End goal</h3><p>Well, ideally we have fewer clickbait "5 best React alternatives"
articles and more thoughtful pieces intended to teach and educate with
a bit of rigor.</p>
<p>It's better for individuals and for companies. It's better for the
internet.</p>
<h3 id="community">Community</h3><p>If you want a community of folks where you can find encouragement to
write and eyes to review drafts, check out the #writing-and-drafts
channel on the <a href="https://eatonphil.com/discord.html">Software Internals Discord</a>.</p>
<h3 id="is-it-worth-writing-about?">Is it worth writing about?</h3><p>Well if you come to me I'm almost surely going to say yes. Poor
Betteridge.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post as a bit of encouragement to folks who want to write more but imagine a high bar for what's worthwhile.<br><br>tldr; if you ask me it's almost always going to be a yes. And I think there's a path toward a higher-quality internet.<a href="https://t.co/Nn6BvXhNdZ">https://t.co/Nn6BvXhNdZ</a> <a href="https://t.co/KELvsxnr2w">pic.twitter.com/KELvsxnr2w</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1598441836284203011?ref_src=twsrc%5Etfw">December 1, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/is-it-worth-writing-about.htmlThu, 01 Dec 2022 00:00:00 +0000
- A Programmer-Friendly I/O Abstraction Over io_uring and kqueuehttp://notes.eatonphil.com/a-friendly-abstraction-over-iouring-and-kqueue.html<head>
<meta http-equiv="refresh" content="4;URL='https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue/'" />
</head><p>This is an external post of mine. Click
<a href="https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue/">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/a-friendly-abstraction-over-iouring-and-kqueue.htmlWed, 23 Nov 2022 00:00:00 +0000
- Writing a SQL database, take two: Zig and RocksDBhttp://notes.eatonphil.com/zigrocks-sql.html<p>For my second project while learning Zig, I decided to port an
old, minimal SQL database project from Go to Zig.</p>
<p>In this post, in ~1700 lines of code (yes, I'm sorry it's bigger than
my usual), we'll create a basic embedded SQL database in Zig on top of
RocksDB. Other than the RocksDB layer it will not use third-party
libraries.</p>
<p>The code for this project is available on <a href="https://github.com/eatonphil/zigrocks">GitHub</a>.</p>
<p>Here are a few example interactions we'll support:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "CREATE TABLE y (year int, age int, name text)")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"CREATE TABLE y (year int, age int, name text)"</span>
<span class="n">ok</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "INSERT INTO y VALUES (2010, 38, 'Gary')")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"INSERT INTO y VALUES (2010, 38, 'Gary')"</span>
<span class="n">ok</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "INSERT INTO y VALUES (2021, 92, 'Teej')")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"INSERT INTO y VALUES (2021, 92, 'Teej')"</span>
<span class="n">ok</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "INSERT INTO y VALUES (1994, 18, 'Mel')")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"INSERT INTO y VALUES (1994, 18, 'Mel')"</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="n">Basic</span><span class="w"> </span><span class="n">query</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "SELECT name, age, year FROM y")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"SELECT name, age, year FROM y"</span>
<span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="k">year</span><span class="w"> </span><span class="o">|</span>
<span class="o">+</span><span class="w"> </span><span class="o">====</span><span class="w"> </span><span class="o">+===</span><span class="w"> </span><span class="o">+====</span><span class="w"> </span><span class="o">+</span>
<span class="o">|</span><span class="w"> </span><span class="n">Mel</span><span class="w"> </span><span class="o">|</span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="mi">1994</span><span class="w"> </span><span class="o">|</span>
<span class="o">|</span><span class="w"> </span><span class="n">Gary</span><span class="w"> </span><span class="o">|</span><span class="mi">38</span><span class="w"> </span><span class="o">|</span><span class="mi">2010</span><span class="w"> </span><span class="o">|</span>
<span class="o">|</span><span class="w"> </span><span class="n">Teej</span><span class="w"> </span><span class="o">|</span><span class="mi">92</span><span class="w"> </span><span class="o">|</span><span class="mi">2021</span><span class="w"> </span><span class="o">|</span>
<span class="o">#</span><span class="w"> </span><span class="k">With</span><span class="w"> </span><span class="k">WHERE</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "SELECT name, year, age FROM y WHERE age < 40")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"SELECT name, year, age FROM y WHERE age < 40"</span>
<span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="k">year</span><span class="w"> </span><span class="o">|</span><span class="n">age</span><span class="w"> </span><span class="o">|</span>
<span class="o">+</span><span class="w"> </span><span class="o">====</span><span class="w"> </span><span class="o">+====</span><span class="w"> </span><span class="o">+===</span><span class="w"> </span><span class="o">+</span>
<span class="o">|</span><span class="w"> </span><span class="n">Mel</span><span class="w"> </span><span class="o">|</span><span class="mi">1994</span><span class="w"> </span><span class="o">|</span><span class="mi">18</span><span class="w"> </span><span class="o">|</span>
<span class="o">|</span><span class="w"> </span><span class="n">Gary</span><span class="w"> </span><span class="o">|</span><span class="mi">2010</span><span class="w"> </span><span class="o">|</span><span class="mi">38</span><span class="w"> </span><span class="o">|</span>
<span class="o">#</span><span class="w"> </span><span class="k">With</span><span class="w"> </span><span class="n">operations</span>
<span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script <(echo "SELECT 'Name: ' || name, year + 30, age FROM y WHERE age < 40")</span>
<span class="n">echo</span><span class="w"> </span><span class="ss">"SELECT 'Name: ' || name, year + 30, age FROM y WHERE age < 40"</span>
<span class="o">|</span><span class="w"> </span><span class="k">unknown</span><span class="w"> </span><span class="o">|</span><span class="k">unknown</span><span class="w"> </span><span class="o">|</span><span class="n">age</span><span class="w"> </span><span class="o">|</span>
<span class="o">+</span><span class="w"> </span><span class="o">=======</span><span class="w"> </span><span class="o">+=======</span><span class="w"> </span><span class="o">+===</span><span class="w"> </span><span class="o">+</span>
<span class="o">|</span><span class="w"> </span><span class="n">Name</span><span class="p">:</span><span class="w"> </span><span class="n">Mel</span><span class="w"> </span><span class="o">|</span><span class="mi">2024</span><span class="w"> </span><span class="o">|</span><span class="mi">18</span><span class="w"> </span><span class="o">|</span>
<span class="o">|</span><span class="w"> </span><span class="n">Name</span><span class="p">:</span><span class="w"> </span><span class="n">Gary</span><span class="w"> </span><span class="o">|</span><span class="mi">2040</span><span class="w"> </span><span class="o">|</span><span class="mi">38</span><span class="w"> </span><span class="o">|</span>
</pre></div>
<p>This post is standalone (except for the RocksDB layer which I <a href="https://notes.eatonphil.com/zigrocks.html">wrote
about here</a>) but it builds on a
number of ideas I've explored that you may be interested in:</p>
<ul>
<li><a href="https://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html">What's the big deal about key-value databases like FoundationDB and RocksDB?</a></li>
<li><a href="https://notes.eatonphil.com/distributed-postgres.html">Let's build a distributed Postgres proof of concept</a></li>
<li><a href="https://notes.eatonphil.com/documentdb.html">Writing a document database from scratch in Go</a></li>
<li>And the grandfather series, <a href="https://notes.eatonphil.com/database-basics.html">Writing a SQL database from scratch in Go</a></li>
</ul>
<p>This project is mostly a port of my <a href="https://notes.eatonphil.com/database-basics.html">SQL database from scratch in
Go</a> project, but
unlike that series this project will have persistent storage via
RocksDB.</p>
<p>And unlike that post, this project is written in Zig!</p>
<p>Let's get started. :)</p>
<h3 id="components">Components</h3><p>We're going to split up the project into the following major
components:</p>
<ul>
<li>Lexing</li>
<li>Parsing</li>
<li>Storage<ul>
<li>RocksDB</li>
</ul>
</li>
<li>Execution</li>
<li>Entrypoint (<code>main</code>)</li>
</ul>
<p><em>Lexing</em> takes a query and breaks it into an array of tokens.</p>
<p><em>Parsing</em> takes the lexed array of tokens and pattern matches into a
syntax tree (AST).</p>
<p><em>Storage</em> maps high-level SQL entities like tables and rows into bytes
that can be easily stored on disk. And it handles recovering
high-level tables and rows from bytes on disk.</p>
<p>Invisible to users of the <em>Storage</em> component is <em>RocksDB</em>, which is how
the bytes are actually stored on disk. <a href="http://rocksdb.org/">RocksDB</a> is a persistent store
that maps arbitary byte keys to arbitrary byte values. We'll use it
for storing and recovering both table metadata and actual row data.</p>
<p><em>Execution</em> takes a query AST and executes it against <em>Storage</em>,
potentially returning result rows.</p>
<p>These terms are a vast simplification of real-world database
design. But they are helpful structure to have even in a project
this small.</p>
<h3 id="memory-management">Memory Management</h3><p>Zig doesn't have a garbage collector. Mitchell Hashimoto <a href="https://github.com/mitchellh/zig-libgc">wrote
bindings to Boehm GC</a>. But Zig
also has a <a href="https://ziglang.org/documentation/master/#toc-Choosing-an-Allocator">builtin Arena
allocator</a>
which is perfect for this simple project.</p>
<p>The <code>main</code> function will create the arena and pass it to each
component, where they can do allocations as they please. At the end of
<code>main</code>, the entire arena will be freed at once.</p>
<p>The only other place where we must do manual memory management is in
the RocksDB wrapper. But <a href="https://notes.eatonphil.com/zigrocks.html">I've already
covered</a> that in a separate
post.</p>
<h3 id="zig-specifics">Zig Specifics</h3><p>I'm not going to cover the basics of Zig syntax. If you are new to
Zig, read <a href="https://notes.eatonphil.com/zigrocks.html">this</a> first!
(It's short.)</p>
<p>Now that we've got the basic idea, we can start coding!</p>
<h3 id="types-(<code>types.zig</code>,-10-loc)">Types (<code>types.zig</code>, 10 LoC)</h3><p>Let's create a few helper types that we'll use in the rest of the
code.</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">;</span>
<span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="p">;</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="n">T</span><span class="o">:</span><span class="w"> </span><span class="kt">type</span><span class="p">)</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="n">T</span><span class="p">,</span>
<span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="n">Error</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>That's it. :) Makes things a little more readable.</p>
<h3 id="lexing-(<code>lex.zig</code>,-308-loc)">Lexing (<code>lex.zig</code>, 308 LoC)</h3><p>Lexing turns a query string into an array of tokens.</p>
<p>There are a few <em>kinds</em> of tokens we'll define:</p>
<ul>
<li>Keywords (like <code>CREATE</code>, <code>true</code>, <code>false</code>, <code>null</code>)<ul>
<li>Syntax (commas, parentheses, operators, and all other builtin symbols)</li>
</ul>
</li>
<li>Strings</li>
<li>Integers</li>
<li>Identifiers</li>
</ul>
<p>And not listed there but important to <em>skip past</em> is whitespace.</p>
<p>Let's turn this into a Zig struct!</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"types.zig"</span><span class="p">).</span><span class="n">Error</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"types.zig"</span><span class="p">).</span><span class="n">String</span><span class="p">;</span>
<span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">start</span><span class="o">:</span><span class="w"> </span><span class="kt">u64</span><span class="p">,</span>
<span class="w"> </span><span class="n">end</span><span class="o">:</span><span class="w"> </span><span class="kt">u64</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Kind</span><span class="p">,</span>
<span class="w"> </span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Keywords</span>
<span class="w"> </span><span class="n">select_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">create_table_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">insert_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">values_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">from_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">where_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="c1">// Operators</span>
<span class="w"> </span><span class="n">plus_operator</span><span class="p">,</span>
<span class="w"> </span><span class="n">equal_operator</span><span class="p">,</span>
<span class="w"> </span><span class="n">lt_operator</span><span class="p">,</span>
<span class="w"> </span><span class="n">concat_operator</span><span class="p">,</span>
<span class="w"> </span><span class="c1">// Other syntax</span>
<span class="w"> </span><span class="n">left_paren_syntax</span><span class="p">,</span>
<span class="w"> </span><span class="n">right_paren_syntax</span><span class="p">,</span>
<span class="w"> </span><span class="n">comma_syntax</span><span class="p">,</span>
<span class="w"> </span><span class="c1">// Literals</span>
<span class="w"> </span><span class="n">identifier</span><span class="p">,</span>
<span class="w"> </span><span class="n">integer</span><span class="p">,</span>
<span class="w"> </span><span class="n">string</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">source</span><span class="p">[</span><span class="n">self</span><span class="p">.</span><span class="n">start</span><span class="p">..</span><span class="n">self</span><span class="p">.</span><span class="n">end</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Using an <code>enum</code> helps us with type safety. And since we're storing
location in the token, we can build a nice debug function for when
lexing or parsing fails.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">debug</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">line</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">column</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">lineStartIndex</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">lineEndIndex</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">source</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">'\n'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">lineStartIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">start</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Find the end of the line</span>
<span class="w"> </span><span class="n">lineEndIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">lineEndIndex</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="se">'\n'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lineEndIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lineEndIndex</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{s}</span><span class="se">\n</span><span class="s">Near line {}, column {}.</span><span class="se">\n</span><span class="s">{s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">msg</span><span class="p">,</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">lineStartIndex</span><span class="p">..</span><span class="n">lineEndIndex</span><span class="p">]</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">column</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"^ Near here</span><span class="se">\n\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>And similarly, let's add a debug helper for when we're dealing with an
array of tokens.</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">preferredIndex</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preferredIndex</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">debug</span><span class="p">(</span><span class="n">msg</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h4 id="token-<>-string-mapping">Token <> String Mapping</h4><p>Before we get too far from <code>Token</code> definition, let's define a mapping
from the <code>Token.kind</code> enum to strings we can see in a query.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">Builtin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">,</span>
<span class="p">};</span>
<span class="c1">// These must be sorted by length of the name text, descending, for lexKeyword.</span>
<span class="kr">var</span><span class="w"> </span><span class="n">BUILTINS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="n">_</span><span class="p">]</span><span class="n">Builtin</span><span class="p">{</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"CREATE TABLE"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">create_table_keyword</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"INSERT INTO"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">insert_keyword</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"SELECT"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"VALUES"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">values_keyword</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"WHERE"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">where_keyword</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"FROM"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">from_keyword</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"||"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">concat_operator</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"="</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">equal_operator</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"+"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">plus_operator</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"<"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">lt_operator</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"("</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">left_paren_syntax</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">")"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">right_paren_syntax</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">","</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="w"> </span><span class="p">},</span>
<span class="p">};</span>
</pre></div>
<p>We'll use this in a few lexing functions below.</p>
<h4 id="whitespace">Whitespace</h4><p>Outside of tokens, we need to be able to skip past whitespace.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">' '</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">'\n'</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">'\t'</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">'\r'</span><span class="p">)</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">res</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">res</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>All lexing functions will look like this. They'll take the source as
one argument and a cursor to the current index in the source as
another.</p>
<h4 id="keywords">Keywords</h4><p>Let's handle lexing keyword tokens next. Keywords are case
insensitive. I don't think there's a builtin case insensitive string
comparison function in Zig. So let's write that first.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">asciiCaseInsensitiveEqual</span><span class="p">(</span><span class="n">left</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">right</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">left</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">right</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">right</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">min</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">left</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">l</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">97</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="mi">122</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">32</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">right</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">97</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="mi">122</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">32</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">l</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Unfortunately it only supports ASCII for now.</p>
<p>Now we can write a simple longest-matching-substring function. It is
simple because the keyword mapping we set up above is already ordered
by length descending.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexKeyword</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">longestLen</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">BUILTINS</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">builtin</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">asciiCaseInsensitiveEqual</span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">index</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="p">],</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">longestLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="p">;</span>
<span class="w"> </span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">kind</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// First match is the longest match</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">longestLen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">longestLen</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">longestLen</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">kind</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>That's it!</p>
<h4 id="integers">Integers</h4><p>For integers we read through the source until we stop seeing
decimal digits. Obviously this is a subset of what people consider
integers, but it will do for now!</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexInteger</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s">'0'</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s">'9'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">start</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">start</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">integer</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<h4 id="strings">Strings</h4><p>Strings are enclosed in single quotes.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="se">'\''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="se">'\''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">'\''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">start</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">start</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">string</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<h4 id="identifiers">Identifiers</h4><p>Identifiers for this project are alphanumeric characters. We could
support more by optionally checking for double quote enclosed
strings. But I'll leave that as an exercise for the reader.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexIdentifier</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">((</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s">'a'</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s">'z'</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s">'A'</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s">'Z'</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">'*'</span><span class="p">))</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">start</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">start</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<h4 id="<code>lex</code>"><code>lex</code></h4><p>Now we can pull together all these helper functions in a public
entrypoint for lexing.</p>
<p>It will loop through a query string, eating whitespace and checking
for tokens. It will continue until it hits the end of the query
string. If it ever can't continue it fails.</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">lex</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">Token</span><span class="p">))</span><span class="w"> </span><span class="o">?</span><span class="n">Error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">keywordRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexKeyword</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">keywordRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Failed to allocate space for keyword token"</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">keywordRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">integerRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexInteger</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">integerRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Failed to allocate space for integer token"</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">integerRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">stringRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">stringRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Failed to allocate space for string token"</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stringRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">identifierRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexIdentifier</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">identifierRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Failed to allocate space for identifier token"</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">identifierRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">"Last good token.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Bad token"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>That's it for lexing! Now we can do parsing.</p>
<h3 id="parsing-(<code>parse.zig</code>,-407-loc)">Parsing (<code>parse.zig</code>, 407 LoC)</h3><p>Parsing takes an array of tokens from the lexing stage and discovers
the tree structure in them that maps to a predefined syntax tree
(AST).</p>
<p>If it can't discover a valid tree from the array of tokens, it fails.</p>
<p>Let's set up the basics of the <code>Parser</code> struct:</p>
<div class="highlight"><pre><span></span><span class="n">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">@import</span><span class="p">(</span><span class="ss">"std"</span><span class="p">);</span>
<span class="n">const</span><span class="w"> </span><span class="n">lex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">@import</span><span class="p">(</span><span class="ss">"lex.zig"</span><span class="p">);</span>
<span class="n">const</span><span class="w"> </span><span class="k">Result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">@import</span><span class="p">(</span><span class="ss">"types.zig"</span><span class="p">).</span><span class="k">Result</span><span class="p">;</span>
<span class="n">const</span><span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">Token</span><span class="p">;</span>
<span class="n">pub</span><span class="w"> </span><span class="n">const</span><span class="w"> </span><span class="n">Parser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="nl">allocator</span><span class="p">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="nl">allocator</span><span class="p">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">)</span><span class="w"> </span><span class="n">Parser</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Parser</span><span class="err">{</span><span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="err">}</span><span class="p">;</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="nl">tokens</span><span class="p">:</span><span class="w"> </span><span class="err">[]</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="k">index</span><span class="err">:</span><span class="w"> </span><span class="n">usize</span><span class="p">,</span><span class="w"> </span><span class="nl">kind</span><span class="p">:</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">)</span><span class="w"> </span><span class="n">bool</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="nf">len</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">false</span><span class="p">;</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">tokens</span><span class="o">[</span><span class="n">index</span><span class="o">]</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">kind</span><span class="p">;</span>
<span class="w"> </span><span class="err">}</span>
</pre></div>
<h4 id="expressions">Expressions</h4><p>Expressions are at the bottom of the syntax tree.</p>
<p>They can be:</p>
<ul>
<li>Literals (like strings, integers, booleans, etc.)</li>
<li>Or binary operations</li>
</ul>
<p>Let's define these in Zig:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">operator</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="n">left</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ExpressionAST</span><span class="p">,</span>
<span class="w"> </span><span class="n">right</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ExpressionAST</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">left</span><span class="p">.</span><span class="n">print</span><span class="p">();</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" {s} "</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">right</span><span class="p">.</span><span class="n">print</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">literal</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="n">binary_operation</span><span class="o">:</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">literal</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">literal</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">string</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"'{s}'"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">literal</span><span class="p">.</span><span class="n">string</span><span class="p">()}),</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"{s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">literal</span><span class="p">.</span><span class="n">string</span><span class="p">()}),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">binary_operation</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">binary_operation</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>Now we can attempt to parse either of these from an array of tokens.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseExpression</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ast</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">,</span>
<span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">e</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">integer</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">string</span><span class="p">))</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"No expression"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">equal_operator</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">lt_operator</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">plus_operator</span><span class="p">)</span><span class="w"> </span><span class="k">or</span>
<span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">concat_operator</span><span class="p">))</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">newE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">binary_operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="w"> </span><span class="p">.</span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for left expression."</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for right expression."</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">newE</span><span class="p">.</span><span class="n">binary_operation</span><span class="p">.</span><span class="n">left</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">e</span><span class="p">;</span>
<span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newE</span><span class="p">;</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="n">binary_operation</span><span class="p">.</span><span class="n">right</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">e</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Basically, we assume it's a literal expression unless we see an
operator after it. If there's an operator after it we call
<code>parseExpression</code> recursively and return a binary expression.</p>
<p>Important to note: this skips both implicit operator precedence and
explicit precedence via parenthesis.</p>
<h4 id="<code>select</code>"><code>SELECT</code></h4><p>A <code>SELECT</code> query's structure has a <code>FROM</code> table name, a
comma-separated list of expressions, and an optional <code>WHERE</code> section
with another expression for the where.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">SelectAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">columns</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">ExpressionAST</span><span class="p">,</span>
<span class="w"> </span><span class="n">from</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="n">where</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">ExpressionAST</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">SelectAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"SELECT</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">" "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">print</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">","</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"FROM</span><span class="se">\n</span><span class="s"> {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">from</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">where</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">where</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">WHERE</span><span class="se">\n</span><span class="s"> "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="n">where</span><span class="p">.</span><span class="n">print</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>To parse it we look for:</p>
<ul>
<li><code>SELECT</code></li>
<li>Then a comma separated list of <code>ExpressionAST</code>s</li>
<li>Then a <code>FROM</code></li>
<li>Then optionally a <code>WHERE</code><ul>
<li>And then another <code>ExpressionAST</code></li>
</ul>
</li>
</ul>
<p>With the help of <code>expectTokenKind</code> and <code>parseExpression</code> it is not too
difficult, but a little verbose, to write.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseSelect</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected SELECT keyword"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SelectAST</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">where</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="c1">// Parse columns</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">from_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected comma."</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for token."</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">from_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected FROM keyword after this.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected FROM keyword"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected FROM table name after this.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected FROM keyword"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">where_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// i + 1, skip past the where</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">where</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Unexpected token."</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Did not complete parsing SELECT"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">select</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>That's it!</p>
<h4 id="<code>create-table</code>"><code>CREATE TABLE</code></h4><p>A <code>CREATE TABLE</code> query's structure has a table name and a list of
comma separated identifier pairs for column name and kind.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">CreateTableColumnAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="n">columns</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">CreateTableColumnAST</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"CREATE TABLE {s} (</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span>
<span class="w"> </span><span class="s">" {s} {s}"</span><span class="p">,</span>
<span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span><span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">kind</span><span class="p">.</span><span class="n">string</span><span class="p">()</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">","</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">")</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>To parse it we look for:</p>
<ul>
<li><code>CREATE TABLE</code></li>
<li>Followed by an identifier (the table name)</li>
<li>Followed by open parenthesis</li>
<li>Followed by a comma separated list of identifier pairs</li>
<li>Followed by close parenthesis</li>
</ul>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseCreateTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">create_table_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected CREATE TABLE keyword"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected table name after CREATE TABLE keyword.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected CREATE TABLE name"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">CreateTableColumnAST</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">create_table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">left_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected opening paren after CREATE TABLE name.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected opening paren"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">right_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected comma."</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">CreateTableColumnAST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected column name after comma.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected identifier."</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected column type after column name.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected identifier."</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for column."</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Skip past final paren.</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Unexpected token."</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Did not complete parsing CREATE TABLE"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">create_table</span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">create_table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">create_table</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="<code>insert-into</code>"><code>INSERT INTO</code></h4><p>And last we've got <code>INSERT INTO</code>. This tree has table name and a list
of expressions to insert into the table.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">InsertAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span>
<span class="w"> </span><span class="n">values</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">ExpressionAST</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">InsertAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"INSERT INTO {s} VALUES ("</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">values</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">print</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">values</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">", "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">")</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>We parse it by looking for:</p>
<ul>
<li><code>INSERT INTO</code></li>
<li>Followed by a table name</li>
<li>Followed by <code>VALUES</code></li>
<li>Followed by open parenthesis</li>
<li>Followed by a comma-separated list of expressions</li>
<li>Followed by a close parenthesis</li>
</ul>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseInsert</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">insert_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected INSERT INTO keyword"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected table name after INSERT INTO keyword.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected INSERT INTO table name"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">InsertAST</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">values_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected VALUES keyword.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected VALUES keyword"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">left_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected opening paren after CREATE TABLE name.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected opening paren"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">right_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">values</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Expected comma."</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">values</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for expression."</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Skip past final paren.</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">"Unexpected token."</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Did not complete parsing INSERT INTO"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">insert</span><span class="p">.</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">values</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">insert</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="<code>ast</code>"><code>AST</code></h4><p>Finally we can define the top-level SQL <code>AST</code> as being the union of
the above three query types.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">AST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">select</span><span class="o">:</span><span class="w"> </span><span class="n">SelectAST</span><span class="p">,</span>
<span class="w"> </span><span class="n">insert</span><span class="o">:</span><span class="w"> </span><span class="n">InsertAST</span><span class="p">,</span>
<span class="w"> </span><span class="n">create_table</span><span class="o">:</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="p">,</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">select</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">select</span><span class="o">|</span><span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span>
<span class="w"> </span><span class="p">.</span><span class="n">insert</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">insert</span><span class="o">|</span><span class="w"> </span><span class="n">insert</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span>
<span class="w"> </span><span class="p">.</span><span class="n">create_table</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">create_table</span><span class="o">|</span><span class="w"> </span><span class="n">create_table</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>And we can implement <code>parse</code> by switching on the current token.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseSelect</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">create_table_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseCreateTable</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">insert_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseInsert</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Unknown statement"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>Perfect. For today. :)</p>
<h3 id="storage-(<code>storage.zig</code>,-338-loc)">Storage (<code>storage.zig</code>, 338 LoC)</h3><p>Next we're going to switch contexts completely and think about how
tables and rows will get serialized into bytes that can be stored on
disk.</p>
<p>The storage layer will define a few general helpers for correctly
serializing and deserializing strings and numbers:</p>
<div class="highlight"><pre><span></span><span class="k">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">"std"</span><span class="p">);</span>
<span class="k">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">"rocksdb.zig"</span><span class="p">)</span><span class="o">.</span><span class="n">RocksDB</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">"types.zig"</span><span class="p">)</span><span class="o">.</span><span class="n">Error</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="n">Result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">"types.zig"</span><span class="p">)</span><span class="o">.</span><span class="n">Result</span><span class="p">;</span>
<span class="k">const</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">"types.zig"</span><span class="p">)</span><span class="o">.</span><span class="n">String</span><span class="p">;</span>
<span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">serializeInteger</span><span class="p">(</span><span class="n">comptime</span><span class="w"> </span><span class="n">T</span><span class="p">:</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="o">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">u8</span><span class="p">),</span><span class="w"> </span><span class="n">i</span><span class="p">:</span><span class="w"> </span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="nb nb-Type">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">length</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="err">@</span><span class="n">sizeOf</span><span class="p">(</span><span class="n">T</span><span class="p">)]</span><span class="n">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">writeIntBig</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">length</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">buf</span><span class="o">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">length</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="mi">8</span><span class="p">]);</span>
<span class="p">}</span>
<span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">deserializeInteger</span><span class="p">(</span><span class="n">comptime</span><span class="w"> </span><span class="n">T</span><span class="p">:</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">)</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">readIntBig</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="err">@</span><span class="n">sizeOf</span><span class="p">(</span><span class="n">T</span><span class="p">)]);</span>
<span class="p">}</span>
<span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="n">buf</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="o">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">u8</span><span class="p">),</span><span class="w"> </span><span class="n">bytes</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="nb nb-Type">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">serializeInteger</span><span class="p">(</span><span class="n">u64</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">bytes</span><span class="o">.</span><span class="n">len</span><span class="p">);</span>
<span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">buf</span><span class="o">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">bytes</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">)</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">offset</span><span class="p">:</span><span class="w"> </span><span class="n">usize</span><span class="p">,</span>
<span class="w"> </span><span class="n">bytes</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">,</span>
<span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeInteger</span><span class="p">(</span><span class="n">u64</span><span class="p">,</span><span class="w"> </span><span class="n">bytes</span><span class="p">);</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">8</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">.</span><span class="p">{</span><span class="w"> </span><span class="o">.</span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">offset</span><span class="p">,</span><span class="w"> </span><span class="o">.</span><span class="n">bytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bytes</span><span class="p">[</span><span class="mf">8.</span><span class="o">.</span><span class="n">offset</span><span class="p">]</span><span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>Then we'll define the <code>Storage</code> struct itself. Under the hood it will
use RocksDB to store and recover data on disk.</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">)</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now let's think about storage entities.</p>
<h4 id="values">Values</h4><p>The fundamental unit in the database is a value, or cell. It can be
either a boolean, an integer, a string, or null.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">bool_value</span><span class="o">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span>
<span class="w"> </span><span class="n">null_value</span><span class="o">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span>
<span class="w"> </span><span class="n">string_value</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="n">integer_value</span><span class="o">:</span><span class="w"> </span><span class="kt">i64</span><span class="p">,</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">TRUE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">FALSE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">NULL</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">};</span>
</pre></div>
<p>Since all values are strings in the original query, we'll provide a
<code>fromIntegerString</code> that we can use to convert.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">fromIntegerString</span><span class="p">(</span><span class="n">iBytes</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">parseInt</span><span class="p">(</span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="n">iBytes</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next we'll define functions to cast values to boolean.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">asBool</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="kc">false</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>To strings.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">asString</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">))</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="c1">// Do nothing</span>
<span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="s">"true"</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="s">"false"</span><span class="p">),</span>
<span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">value</span><span class="p">),</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">"{d}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">value</span><span class="p">}),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And to integers.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">asInteger</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">fromIntegerString</span><span class="p">(</span><span class="n">value</span><span class="p">).</span><span class="n">integer_value</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And finally the storage layer's core concern: serialization...</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">serialize</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">))</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">'0'</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">'1'</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="s">'1'</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="s">'0'</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">'2'</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">'3'</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="n">serializeInteger</span><span class="p">(</span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And deserialization.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">deserialize</span><span class="p">(</span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">'0'</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="s">'1'</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">'1'</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s">'2'</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">..]</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="s">'3'</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeInteger</span><span class="p">(</span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">..])</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>We use a simple, space-inefficient scheme for encoding/decoding to
bytes that can be written to disk.</p>
<h4 id="rows">Rows</h4><p>Now that we've got values, we can define rows in terms of values. And
we can provide a few helper functions for getting cells by field name.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Row</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">cells</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">),</span>
<span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Row</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Row</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">cells</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">),</span>
<span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fields</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Row</span><span class="p">,</span><span class="w"> </span><span class="n">cell</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cellBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cell</span><span class="p">.</span><span class="n">serialize</span><span class="p">(</span><span class="o">&</span><span class="n">cellBuffer</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">appendBytes</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Row</span><span class="p">,</span><span class="w"> </span><span class="n">cell</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cell</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">,</span><span class="w"> </span><span class="n">field</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">f</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">field</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Results are internal buffer views. So make a copy.</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">copy</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">items</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">deserialize</span><span class="p">(</span><span class="n">copy</span><span class="p">.</span><span class="n">items</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">items</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">reset</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">clearRetainingCapacity</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>Since values are serialized with length prefixes, we can
serialize a row by concatenating all the values together.</p>
<p>Since we must map to keys and values for RocksDB, we give each row a
key prefix that is the table name. And then we give it a random suffix
to distinguish it from other rows in the table. A more intelligent
design would use the table's primary key as the suffix but we don't
support primary keys yet. (See also, the section on "Mapping SQL to
key-value storage" in <a href="https://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html">What's the big deal about key-value databases
like FoundationDB and
RocksDB?</a>.)</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">generateId</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFileZ</span><span class="p">(</span><span class="s">"/dev/random"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">buf</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="mi">16</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{};</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="o">&</span><span class="n">buf</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">..];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeRow</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="n">Error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Table name prefix</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">"row_{s}_"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">table</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not allocate row key"</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Unique row id</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">generateId</span><span class="p">()</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not generate id"</span><span class="p">;</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not allocate for id"</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">row</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">cell</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="o">&</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">cell</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not allocate for cell"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">items</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="rowiter">RowIter</h4><p>Reading rows will be slightly different from writing rows since
reading rows will use an iterator. We will wrap the RocksDB iterator
so the consumer of <code>Storage</code> only needs to deal with <code>Row</code>s and
<code>Value</code>s.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">RowIter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">row</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">,</span>
<span class="w"> </span><span class="n">iter</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">.</span><span class="n">Iter</span><span class="p">,</span>
<span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">iter</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">.</span><span class="n">Iter</span><span class="p">,</span><span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">RowIter</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">RowIter</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iter</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">row</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Row</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">fields</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">next</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">RowIter</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="n">Row</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rowBytes</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">iter</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">b</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rowBytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">row</span><span class="p">.</span><span class="n">reset</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">offset</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">offset</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">rowBytes</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">rowBytes</span><span class="p">[</span><span class="n">offset</span><span class="p">..]);</span>
<span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">d</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">row</span><span class="p">.</span><span class="n">appendBytes</span><span class="p">(</span><span class="n">d</span><span class="p">.</span><span class="n">bytes</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">row</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RowIter</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">iter</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>It does the opposite of what <code>writeRow</code> did in terms of deserializing
cells one after another. Again, this works because each cell is
length-prefixed.</p>
<p>Next we must provide the interface for actually getting a
<code>RowIter</code>. The only condition for the <code>RowIter</code> at the moment is that
it contains all rows in the table.</p>
<p>Since we wrote each row with a table name prefix, we can recover it by
iterating over all rows with that prefix.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">getRowIter</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">RowIter</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rowPrefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">rowPrefix</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">"row_{s}_"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">table</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for row prefix"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">iter</span><span class="p">(</span><span class="n">rowPrefix</span><span class="p">.</span><span class="n">items</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">it</span><span class="o">|</span><span class="w"> </span><span class="n">it</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">tableInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">getTable</span><span class="p">(</span><span class="n">table</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">t</span><span class="o">|</span><span class="w"> </span><span class="n">t</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RowIter</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">iter</span><span class="p">,</span><span class="w"> </span><span class="n">tableInfo</span><span class="p">.</span><span class="n">columns</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="tables">Tables</h4><p>Finally we've got tables. We must store table metadata: its name,
columns and column types.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="n">columns</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="n">types</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>We will use a <code>tbl_</code> prefix instead of <code>row_</code> prefix for table
metadata. But we'll otherwise encode with the same length-prefixed
concatentations.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">Table</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="n">Error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Table name prefix</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">"tbl_{s}_"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">table</span><span class="p">.</span><span class="n">name</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not allocate key for table"</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">table</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="o">&</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">column</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not allocate for column"</span><span class="p">;</span>
<span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="o">&</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="p">.</span><span class="n">types</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Could not allocate for column type"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">items</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And the opposite for decoding.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">getTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">Table</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">tableKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">tableKey</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">"tbl_{s}_"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">name</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for table prefix"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Table</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="c1">// First grab table info</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columnInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">tableKey</span><span class="p">.</span><span class="n">items</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="n">val</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">not_found</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"No such table"</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columnOffset</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">columnOffset</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">columnInfo</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">columnInfo</span><span class="p">[</span><span class="n">columnOffset</span><span class="p">..]);</span>
<span class="w"> </span><span class="n">columnOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span>
<span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">.</span><span class="n">bytes</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for column name."</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">columnInfo</span><span class="p">[</span><span class="n">columnOffset</span><span class="p">..]);</span>
<span class="w"> </span><span class="n">columnOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">kind</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span>
<span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">kind</span><span class="p">.</span><span class="n">bytes</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for column kind."</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">table</span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="n">table</span><span class="p">.</span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>And that's it for storage! Again, we're building on top of the
<a href="https://notes.eatonphil.com/zigrocks.html">RocksDB layer</a> I already
wrote about. If you want to see how that works, go for it!</p>
<p>If you just want the <code>rocksdb.zig</code> file, grab it from
<a href="https://github.com/eatonphil/zigrocks/blob/7831e390f4044bb999507fd6d0e23bb2475756f8/rocksdb.zig">here</a>.</p>
<h3 id="execute-(<code>execute.zig</code>,-210-loc)">Execute (<code>execute.zig</code>, 210 LoC)</h3><p>Now that we've got a storage layer and an AST from our parser, we can
execute the query on top of the storage!</p>
<p>A better implementation might translate the AST to bytecode and
implement a bytecode interpreter for expression evaluation. But we'll
build a tree-walking interpreter instead.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">Parser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"parse.zig"</span><span class="p">).</span><span class="n">Parser</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"rocksdb.zig"</span><span class="p">).</span><span class="n">RocksDB</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"storage.zig"</span><span class="p">).</span><span class="n">Storage</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">Result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"types.zig"</span><span class="p">).</span><span class="n">Result</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"types.zig"</span><span class="p">).</span><span class="n">String</span><span class="p">;</span>
<span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Executor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span>
<span class="w"> </span><span class="n">storage</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span>
<span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">storage</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">)</span><span class="w"> </span><span class="n">Executor</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Executor</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>In general we'll make query responses optional. They can be empty or
they can be an array of an array of strings (rows and cells) and an
array of strings (column names).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">QueryResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="c1">// Array of cells (which is an array of serde (which is an array of u8))</span>
<span class="w"> </span><span class="n">rows</span><span class="o">:</span><span class="w"> </span><span class="p">[][]</span><span class="n">String</span><span class="p">,</span>
<span class="w"> </span><span class="n">empty</span><span class="o">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">QueryResponse</span><span class="p">);</span>
</pre></div>
<h4 id="expressions">Expressions</h4><p>For execution we start again at the bottom with expressions. There are literals.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeExpression</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">e</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">ExpressionAST</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">lit</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">string</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">()</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">fromIntegerString</span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">()),</span>
<span class="w"> </span><span class="p">.</span><span class="n">identifier</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">row</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">()),</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
</pre></div>
<p>And there are a handful of binary operations.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">.</span><span class="n">binary_operation</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">bin_op</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">left</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">right</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">equal_operator</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Cast dissimilar types to serde</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">@enumToInt</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">@enumToInt</span><span class="p">(</span><span class="n">right</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">leftBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">leftBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">leftBuf</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rightBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">rightBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rightBuf</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">v</span><span class="o">|</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asBool</span><span class="p">(),</span>
<span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">blk</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">leftBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">leftBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rightBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">rightBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="o">:</span><span class="n">blk</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">leftBuf</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">rightBuf</span><span class="p">.</span><span class="n">items</span><span class="p">);</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asInteger</span><span class="p">(),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">concat_operator</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">copy</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">copy</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copy</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">lt_operator</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">left</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asInteger</span><span class="p">())</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">TRUE</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">FALSE</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">plus_operator</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="<code>select</code>"><code>SELECT</code></h4><p>To execute a <code>SELECT</code> query we first validate the requested table and
requested fields.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeSelect</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">s</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">SelectAST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">getTable</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">from</span><span class="p">.</span><span class="n">string</span><span class="p">()))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Now validate and store requested fields</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">requestedFields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">requestedColumn</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">fieldName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">requestedColumn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">lit</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">identifier</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span>
<span class="w"> </span><span class="c1">// TODO: give reasonable names</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="s">"unknown"</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="c1">// TODO: give reasonable names</span>
<span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="s">"unknown"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">requestedFields</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">fieldName</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for requested field."</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then grab an iterator for rows in the table.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">QueryResponse</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">requestedFields</span><span class="p">.</span><span class="n">items</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">empty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">getRowIter</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">from</span><span class="p">.</span><span class="n">string</span><span class="p">()))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">it</span><span class="o">|</span><span class="w"> </span><span class="n">it</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">iter</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
</pre></div>
<p>And finally we iterate through all rows and add rows to the response
if there is no <code>WHERE</code> condition or if we evaluate the <code>WHERE</code>
condition successfully.</p>
<p>When we add rows to the response, we need to actually evaluate the
expression for each column in the <code>SELECT</code> AST.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">iter</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">row</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">where</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">where</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">where</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">).</span><span class="n">asBool</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">add</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">requested</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">exp</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">valBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&</span><span class="n">valBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span>
<span class="w"> </span><span class="n">requested</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">valBuf</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for requested cell"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">rows</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">requested</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for row"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">response</span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rows</span><span class="p">.</span><span class="n">items</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="<code>insert-into</code>"><code>INSERT INTO</code></h4><p>Inserting is pretty simple, we just evaluate the <code>VALUES</code> passed and
write them to storage.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeInsert</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">InsertAST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">emptyRow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Row</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="kc">undefined</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">row</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Row</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="kc">undefined</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">values</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">v</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">emptyRow</span><span class="p">);</span>
<span class="w"> </span><span class="n">row</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">exp</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for cell"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">writeRow</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span><span class="w"> </span><span class="n">row</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">empty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="<code>create-table</code>"><code>CREATE TABLE</code></h4><p>Similarly to <code>INSERT INTO</code>, but without any expression evaluation, we
map the <code>CreateTableAST</code> to <code>Storage</code> entities and write them to
storage.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeCreateTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">CreateTableAST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">string</span><span class="p">())</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for column name"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">.</span><span class="n">kind</span><span class="p">.</span><span class="n">string</span><span class="p">())</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not allocate for column kind"</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Table</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span>
<span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">items</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">writeTable</span><span class="p">(</span><span class="n">table</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">empty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>For both <code>CREATE TABLE</code> and <code>INSERT INTO</code> there is more validation we
could do. Exercise for the reader and whatnot. :)</p>
<h4 id="<code>execute</code>"><code>execute</code></h4><p>Finally we can switch on the <code>AST</code> and call the appropriate execution
function.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">execute</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">ast</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">select</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">select</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeSelect</span><span class="p">(</span><span class="n">select</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">insert</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">insert</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeInsert</span><span class="p">(</span><span class="n">insert</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">create_table</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">createTable</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeCreateTable</span><span class="p">(</span><span class="n">createTable</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>And now we're ready to put it all together in <code>main</code>!</p>
<h3 id="<code>main</code>-(<code>main.zig</code>,-144-loc)"><code>main</code> (<code>main.zig</code>, 144 LoC)</h3><p>First we set up our arena allocator.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"rocksdb.zig"</span><span class="p">).</span><span class="n">RocksDB</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">lex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"lex.zig"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"parse.zig"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"execute.zig"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"storage.zig"</span><span class="p">).</span><span class="n">Storage</span><span class="p">;</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">();</span>
</pre></div>
<p>Then we parse CLI arguments. Importantly we need to grab a location on
disk for RocksDB to store data. And we need a query to execute.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">debugTokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">debugAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">args</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">scriptArg</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">databaseArg</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">arg</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"--debug-tokens"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">debugTokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"--debug-ast"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">debugAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"--database"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">databaseArg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"--script"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">scriptArg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">databaseArg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"--database is a required flag. Should be a directory for data.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">scriptArg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"--script is a required flag. Should be a file containing SQL.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next we read the file passed for the query.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFileZ</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="n">scriptArg</span><span class="p">],</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">getEndPos</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">file_size</span><span class="p">);</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">prog</span><span class="p">);</span>
</pre></div>
<p>And pass the query to the lexer.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">lex</span><span class="p">.</span><span class="n">Token</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">lexErr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">lex</span><span class="p">(</span><span class="n">prog</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">tokens</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">lexErr</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Failed to lex: {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">debugTokens</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Token: {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">token</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Program is empty"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Pass the tokens to the parser.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">parser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">.</span><span class="n">Parser</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ast</span><span class="o">:</span><span class="w"> </span><span class="n">parse</span><span class="p">.</span><span class="n">Parser</span><span class="p">.</span><span class="n">AST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">parser</span><span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Failed to parse: {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">debugAST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ast</span><span class="p">.</span><span class="n">print</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Initialize storage.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataDirectory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="n">databaseArg</span><span class="p">]);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">RocksDB</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">dataDirectory</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Failed to open database: {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">db</span><span class="p">);</span>
</pre></div>
<p>And execute and print results. :)</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">executor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">execute</span><span class="p">.</span><span class="n">Executor</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">storage</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">executor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="n">ast</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Failed to execute: {s}"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">rows</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"ok</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"| "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"{s}</span><span class="se">\t\t</span><span class="s">|"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">field</span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"+ "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">fieldLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">field</span><span class="p">.</span><span class="n">len</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">fieldLen</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"="</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="n">fieldLen</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\t\t</span><span class="s">+"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">rows</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">row</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"| "</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">row</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">cell</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"{s}</span><span class="se">\t\t</span><span class="s">|"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">cell</span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="build.zig">build.zig</h3><p>Finally, finally, tie it all together with <code>build.zig</code>.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"builtin"</span><span class="p">).</span><span class="n">zig_version</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">build</span><span class="p">(</span><span class="n">b</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">build</span><span class="p">.</span><span class="n">Builder</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">exe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">addExecutable</span><span class="p">(</span><span class="s">"main"</span><span class="p">,</span><span class="w"> </span><span class="s">"main.zig"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkLibC</span><span class="p">();</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkSystemLibraryName</span><span class="p">(</span><span class="s">"rocksdb"</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">@hasDecl</span><span class="p">(</span><span class="nb">@TypeOf</span><span class="p">(</span><span class="n">exe</span><span class="p">.</span><span class="o">*</span><span class="p">),</span><span class="w"> </span><span class="s">"addLibraryPath"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addLibraryPath</span><span class="p">(</span><span class="s">"./rocksdb"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addIncludePath</span><span class="p">(</span><span class="s">"./rocksdb/include"</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addLibPath</span><span class="p">(</span><span class="s">"./rocksdb"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addIncludeDir</span><span class="p">(</span><span class="s">"./rocksdb/include"</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">setOutputDir</span><span class="p">(</span><span class="s">"."</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">exe</span><span class="p">.</span><span class="n">target</span><span class="p">.</span><span class="n">isDarwin</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addRPath</span><span class="p">(</span><span class="s">"."</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">install</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>Grab RocksDB, build it, and build our CLI.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/facebook/rocksdb
$<span class="w"> </span><span class="o">(</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>rocksdb<span class="w"> </span><span class="o">&&</span><span class="w"> </span>make<span class="w"> </span>shared_lib<span class="w"> </span>-j8<span class="w"> </span><span class="o">)</span>
<span class="c1"># ONLY IF YOU ARE ON A MAC</span>
$<span class="w"> </span>cp<span class="w"> </span>rocksdb/*.dylib<span class="w"> </span>.<span class="w"> </span><span class="c1"># ONLY IF YOU ARE ON A MAC</span>
<span class="c1"># DONE ONLY IF YOU ARE ON A MAC</span>
$<span class="w"> </span>zig<span class="w"> </span>build
</pre></div>
<p>And give it a go. :)</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"CREATE TABLE y (year int, age int, name text)"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"CREATE TABLE y (year int, age int, name text)"</span>
ok
$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"INSERT INTO y VALUES (2010, 38, 'Gary')"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"INSERT INTO y VALUES (2010, 38, 'Gary')"</span>
ok
$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"INSERT INTO y VALUES (2021, 92, 'Teej')"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"INSERT INTO y VALUES (2021, 92, 'Teej')"</span>
ok
$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"INSERT INTO y VALUES (1994, 18, 'Mel')"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"INSERT INTO y VALUES (1994, 18, 'Mel')"</span>
ok
<span class="c1"># Basic query</span>
$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"SELECT name, age, year FROM y"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"SELECT name, age, year FROM y"</span>
<span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span>age<span class="w"> </span><span class="p">|</span>year<span class="w"> </span><span class="p">|</span>
+<span class="w"> </span><span class="o">====</span><span class="w"> </span>+<span class="o">===</span><span class="w"> </span>+<span class="o">====</span><span class="w"> </span>+
<span class="p">|</span><span class="w"> </span>Mel<span class="w"> </span><span class="p">|</span><span class="m">18</span><span class="w"> </span><span class="p">|</span><span class="m">1994</span><span class="w"> </span><span class="p">|</span>
<span class="p">|</span><span class="w"> </span>Gary<span class="w"> </span><span class="p">|</span><span class="m">38</span><span class="w"> </span><span class="p">|</span><span class="m">2010</span><span class="w"> </span><span class="p">|</span>
<span class="p">|</span><span class="w"> </span>Teej<span class="w"> </span><span class="p">|</span><span class="m">92</span><span class="w"> </span><span class="p">|</span><span class="m">2021</span><span class="w"> </span><span class="p">|</span>
<span class="c1"># With WHERE</span>
$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"SELECT name, year, age FROM y WHERE age < 40"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"SELECT name, year, age FROM y WHERE age < 40"</span>
<span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span>year<span class="w"> </span><span class="p">|</span>age<span class="w"> </span><span class="p">|</span>
+<span class="w"> </span><span class="o">====</span><span class="w"> </span>+<span class="o">====</span><span class="w"> </span>+<span class="o">===</span><span class="w"> </span>+
<span class="p">|</span><span class="w"> </span>Mel<span class="w"> </span><span class="p">|</span><span class="m">1994</span><span class="w"> </span><span class="p">|</span><span class="m">18</span><span class="w"> </span><span class="p">|</span>
<span class="p">|</span><span class="w"> </span>Gary<span class="w"> </span><span class="p">|</span><span class="m">2010</span><span class="w"> </span><span class="p">|</span><span class="m">38</span><span class="w"> </span><span class="p">|</span>
<span class="c1"># With operations</span>
$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span><<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">"SELECT 'Name: ' || name, year + 30, age FROM y WHERE age < 40"</span><span class="o">)</span>
<span class="nb">echo</span><span class="w"> </span><span class="s2">"SELECT 'Name: ' || name, year + 30, age FROM y WHERE age < 40"</span>
<span class="p">|</span><span class="w"> </span>unknown<span class="w"> </span><span class="p">|</span>unknown<span class="w"> </span><span class="p">|</span>age<span class="w"> </span><span class="p">|</span>
+<span class="w"> </span><span class="o">=======</span><span class="w"> </span>+<span class="o">=======</span><span class="w"> </span>+<span class="o">===</span><span class="w"> </span>+
<span class="p">|</span><span class="w"> </span>Name:<span class="w"> </span>Mel<span class="w"> </span><span class="p">|</span><span class="m">2024</span><span class="w"> </span><span class="p">|</span><span class="m">18</span><span class="w"> </span><span class="p">|</span>
<span class="p">|</span><span class="w"> </span>Name:<span class="w"> </span>Gary<span class="w"> </span><span class="p">|</span><span class="m">2040</span><span class="w"> </span><span class="p">|</span><span class="m">38</span><span class="w"> </span><span class="p">|</span>
</pre></div>
<h3 id="from-here">From Here</h3><p>As mentioned, this project is a vast simplification and there are
plenty of bugs and subpar design choices. But hopefully it helps to
make database development feel a little less intimidating!</p>
<p>If you liked this, here are some other things you might want to check
out!</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/23463279-designing-data-intensive-applications">Designing Database Intensive Applications</a></li>
<li><a href="https://www.goodreads.com/en/book/show/44647144-database-internals">Database Internals: A Deep Dive Into How Distributed Data Systems Work</a></li>
<li><a href="https://reddit.com/r/databasedevelopment">r/databasedevelopment</a></li>
<li><a href="https://eatonphil.com/discord.html">The #dbs channel on a software internals/hacking Discord I run</a></li>
<li><a href="https://github.com/gosql">gosql</a></li>
</ul>
<p>And of course, other posts on this blog. :)</p>
<p>Lastly, a few resources that helped me out while hacking on this:</p>
<ul>
<li><a href="https://ziglang.org/documentation/master/">Zig Documentation</a></li>
<li>Browsing the source code (and tests!!) of standard library data structures</li>
<li><a href="https://discord.gg/gxsFFjE">Zig Programming Language Discord's #zig-help channel</a><ul>
<li>Friendly and helpful crowd :)</li>
</ul>
</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Spent a month hacking on it and happy to finally have this post out.<br><br>Let's build a basic SQL database in Zig on top of RocksDB. 😃<a href="https://t.co/fkSnaEKsya">https://t.co/fkSnaEKsya</a> <a href="https://t.co/adfpMvvvOn">pic.twitter.com/adfpMvvvOn</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1591974393130934273?ref_src=twsrc%5Etfw">November 14, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/zigrocks-sql.htmlSun, 13 Nov 2022 00:00:00 +0000
- A minimal RocksDB example with Zighttp://notes.eatonphil.com/zigrocks.html<p>I mostly programmed in Go the last few years. So every time I
wanted an embedded key-value database, I reached for Cockroach's
<a href="https://github.com/cockroachdb/pebble">Pebble</a>.</p>
<p>Pebble is great for Go programming but Go does not embed well into
other languages. Pebble was inspired by
<a href="https://github.com/facebook/rocksdb">RocksDB</a> (and its predecessor,
<a href="https://github.com/google/leveldb">LevelDB</a>). Both were written in
C++ which can more easily be embedded into any language with a C
foreign function interface. Pebble also has some interesting
limitations that RocksDB does not,
<a href="https://github.com/facebook/rocksdb/wiki/Transactions">transactions</a>
for example.</p>
<p>So I've been wanting to get familiar with RocksDB. And I've been
learning Zig, so I set out to write a simple Zig program that embeds
RocksDB. (If you see weird things in my Zig code and have
suggestions, <a href="mailto:[email protected]">send me a note</a>!)</p>
<p>This post is going to be a mix of RocksDB explanations and Zig
explanations. By the end we'll have a simple CLI over a durable store
that is able to set keys, get keys, and list all key-value pairs
(optionally filtered on a key prefix).</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./kv<span class="w"> </span><span class="nb">set</span><span class="w"> </span>x<span class="w"> </span><span class="m">1</span>
$<span class="w"> </span>./kv<span class="w"> </span>get<span class="w"> </span>x
<span class="m">1</span>
$<span class="w"> </span>./kv<span class="w"> </span><span class="nb">set</span><span class="w"> </span>y<span class="w"> </span><span class="m">22</span>
$<span class="w"> </span>./kv<span class="w"> </span>list<span class="w"> </span>x
<span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span>
$<span class="w"> </span>./kv<span class="w"> </span>list<span class="w"> </span>y
<span class="nv">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">22</span>
$<span class="w"> </span>./kv<span class="w"> </span>list
<span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span>
<span class="nv">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">22</span>
</pre></div>
<p>Basic stuff!</p>
<p>You can find the code for this post in the <a href="https://github.com/eatonphil/zigrocks">rocksdb.zig file on
Github</a>. To simplify things,
this code is only going to work on Linux. And it will require Zig
0.10.x.</p>
<h3 id="rocksdb">RocksDB</h3><p>RocksDB is written in C++. But most languages cannot interface with
C++. (Zig cannot either, as far as I understand). So most C++
libraries expose a C API that is easier for other programming
languages to interact with. RocksDB does this. Great!</p>
<p>Now RocksDB's <a href="https://github.com/facebook/rocksdb/wiki">C++
documentation</a> is
phenomenal, especially among C++ libraries. But if there is
documentation for the C API, I couldn't find it. Instead you must
trawl through the <a href="https://github.com/facebook/rocksdb/blob/main/include/rocksdb/c.h">C header
file</a>,
the <a href="https://github.com/facebook/rocksdb/blob/main/db/c.cc">C wrapper
implementation</a>,
and the <a href="https://github.com/facebook/rocksdb/blob/main/db/c_test.c">C
tests</a>.</p>
<p>There was also a <a href="https://gist.github.com/nitingupta910/4640638be7e7ad39c41e">great gist showing a minimal RocksDB C
example</a>. But
it didn't cover the iterator API for fetching a range of keys with a
prefix. But with the C tests file I was able to figure it out, I
think.</p>
<p>Let's dig in!</p>
<h3 id="creating,-opening-and-closing-a-rocksdb-database">Creating, opening and closing a RocksDB database</h3><p>First we need to import the C header so that Zig can compile-time
verify the foreign functions we call. We'll also import the standard
library that we'll use later.</p>
<p>Aside from <code>build.zig</code> below, all code should be in <code>main.zig</code>.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">const</span><span class="w"> </span><span class="n">rdb</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@cImport</span><span class="p">(</span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">"rocksdb/c.h"</span><span class="p">));</span>
</pre></div>
<p class="note">
Don't read anything into the `@` other than that this is a compiler
builtin. It's used for imports, casting, and other metaprogramming.
</p><p>Now we can build our wrapper. It will be a Zig struct that contains a
pointer to the RocksDB instance.</p>
<div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_t</span><span class="p">,</span>
</pre></div>
<p>To open a database we'll call <code>rocksdb_open()</code> with a directory name
for RocksDB to store data. And we'll tell RocksDB to create the
database if it doesn't already exist.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">open</span><span class="p">(</span><span class="n">dir</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">options</span><span class="o">:</span><span class="w"> </span><span class="o">?*</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_options_t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_options_create</span><span class="p">();</span>
<span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_options_set_create_if_missing</span><span class="p">(</span><span class="n">options</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[</span><span class="o">*:</span><span class="mi">0</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="o">?*</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_open</span><span class="p">(</span><span class="n">options</span><span class="p">,</span><span class="w"> </span><span class="n">dir</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">err</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="o">?</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Finally, we close with <code>rocksdb_close()</code>:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_close</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>The RocksDB aspect of this is easy. But there's a bunch of
Zig-specific details I should (try to) explain.</p>
<h4 id="return-types">Return types</h4><p>Zig has a cool
<a href="https://ziglang.org/documentation/master/#Errors"><code>error</code></a>
type. <code>try</code>/<code>catch</code> in Zig work only with this <code>error</code> type and
subsets of it you can create. <code>error</code> is an enum. But Zig <code>error</code>s are not
ML-style tagged unions (yet?). That is, you cannot both return an
error and some dynamic information about the error. So the usefulness
of <code>error</code> is limited. It mostly only works if the errors are a finite
set without dynamic aspects.</p>
<p>Zig also doesn't have multiple return values. But it does have
optional types (denoted with <code>?</code>) and it has anonymous structs.</p>
<p>So we can do a slightly less safe, but more informational, error type
by returning a struct with an optional success value and an optional
error.</p>
<p>That's how we get the return type <code>struct { val: ?RocksDB, err: ?[]u8 }</code>.</p>
<p>This is not very different from Go, certainly no less safe, and I'm
probably biased to use this as a Go programmer.</p>
<p class="note">
Felix Queißner points out to me that there are tagged unions in Zig
that would be more safe here. Instead of <code>struct { val:
?RocksDB, err: ?[]u8 }</code> I could do <code>union(enum) { val:
RocksDB, err: []u8 }</code>. When I get a chance to play with that
syntax I'll modify this post.
</p><h4 id="optional-pointers">Optional pointers</h4><p>The next thing you may notice is <code>?*rdb.rocksdb_options_t</code> and
<code>?*rdb.rocksdb_t</code>. This is to work with Zig's type system. Zig expects
that pointers are not null. By adding <code>?</code> we are telling Zig that this
value can be null. That way the Zig type system will force us to
handle the null condition if we try to access fields on the value.</p>
<p>In the options case, it doesn't really matter if the result is <code>null</code>
or not. In the database case, we handle null-ness it by checking the
error value <code>if (err) |errStr|</code>. If this condition is <em>not</em> met, we
know the database is not null. So we use <code>db.?</code> to assert and return a
value that, in the type system, is not null.</p>
<h4 id="zig-strings,-c-strings">Zig strings, C strings</h4><p>Another thing you may notice is <code>var err:
?[*:0]u8 = null;</code>. Zig strings are expressed as byte arrays or byte
slices. <code>[]u8</code> and <code>[]const u8</code> are slices that keep track of the
number of items. <code>[*:0]u8</code> is <em>not</em> a byte slice. It has no length and
is only null-delimited. To go from the null-delimited array that the C
API returns to the <code>[]u8</code> (slice that contains length) in our
function's return signature we use
<a href="https://github.com/ziglang/zig/blob/30b8b29f88362d18ea6523a859b29f7bc6dec622/lib/std/mem.zig"><code>std.mem.span</code></a>.</p>
<p><a href="https://stackoverflow.com/questions/72736997/how-to-pass-a-c-string-into-a-zig-function-expecting-a-zig-string">This StackOverflow
post</a>
was useful for understanding this.</p>
<h4 id="structs">Structs</h4><p>Anonymous structs in Zig are prefixed with a <code>.</code>. And all struct
fields, anonymous or not, are prefixed with <code>.</code>.</p>
<p>So <code>.{.x = 1}</code> instantiates an anonymous struct that has one field
<code>x</code>.</p>
<p>Struct fields in Zig cannot <em>not</em> be instantiated, even if they are
nullable. And when you initialize a nullable value you don't need to
wrap it in a <code>Some()</code> like you might do in an ML.</p>
<p>One thing I found surprising about Zig anonymous structs is that
instances of the anonymous <em>type</em> are created per function and two
anonymous structs that are structurally identical but referenced in
different functions are not actually type-equal.</p>
<p>So this doesn't compile:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doA</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{.</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">};</span>
<span class="p">}</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">doA</span><span class="p">();</span>
<span class="p">}</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doB</span><span class="p">();</span>
<span class="p">}</span>
<span class="err">$</span><span class="w"> </span><span class="n">zig</span><span class="w"> </span><span class="n">build</span><span class="o">-</span><span class="n">exe</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span>
<span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">5</span><span class="o">:</span><span class="mi">15</span><span class="o">:</span><span class="w"> </span><span class="k">error</span><span class="o">:</span><span class="w"> </span><span class="n">expected</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="err">'</span><span class="k">test</span><span class="p">.</span><span class="n">doB__struct_2890</span><span class="err">'</span><span class="p">,</span><span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="err">'</span><span class="k">test</span><span class="p">.</span><span class="n">doA__struct_3878</span><span class="err">'</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">doA</span><span class="p">();</span>
<span class="w"> </span><span class="o">~~~^~</span>
<span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">1</span><span class="o">:</span><span class="mi">10</span><span class="o">:</span><span class="w"> </span><span class="n">note</span><span class="o">:</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="n">declared</span><span class="w"> </span><span class="n">here</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doA</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">^~~~~~~~~~~~~~~~</span>
<span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">4</span><span class="o">:</span><span class="mi">10</span><span class="o">:</span><span class="w"> </span><span class="n">note</span><span class="o">:</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="n">declared</span><span class="w"> </span><span class="n">here</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">^~~~~~~~~~~~~~~~</span>
<span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">4</span><span class="o">:</span><span class="mi">10</span><span class="o">:</span><span class="w"> </span><span class="n">note</span><span class="o">:</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="n">declared</span><span class="w"> </span><span class="n">here</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">^~~~~~~~~~~~~~~~</span>
<span class="n">referenced</span><span class="w"> </span><span class="n">by</span><span class="o">:</span>
<span class="w"> </span><span class="n">main</span><span class="o">:</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">8</span><span class="o">:</span><span class="mi">9</span>
<span class="w"> </span><span class="n">callMain</span><span class="o">:</span><span class="w"> </span><span class="o">/</span><span class="n">whatever</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">std</span><span class="o">/</span><span class="n">start</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">606</span><span class="o">:</span><span class="mi">32</span>
<span class="w"> </span><span class="n">remaining</span><span class="w"> </span><span class="n">reference</span><span class="w"> </span><span class="n">traces</span><span class="w"> </span><span class="n">hidden</span><span class="p">;</span><span class="w"> </span><span class="n">use</span><span class="w"> </span><span class="err">'</span><span class="o">-</span><span class="n">freference</span><span class="o">-</span><span class="n">trace</span><span class="err">'</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">see</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">reference</span><span class="w"> </span><span class="n">traces</span>
</pre></div>
<p>You would need to instantiate a new anonymous struct in the second function.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doA</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{.</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">};</span>
<span class="p">}</span>
<span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doA</span><span class="p">().</span><span class="n">y</span><span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doB</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<h4 id="uniform-function-call-syntax">Uniform function call syntax</h4><p>Zig seems to support something like <a href="https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax">uniform function call
syntax</a>
where you can either call a function with arguments or you can omit
the first argument by prefixing the function call with
<code>firstargument.</code>. I.e. <code>x.add(y)</code> and <code>add(x, y)</code>.</p>
<p>In the case of this code it would be <code>RocksDB.close(db)</code> vs
<code>db.close()</code> assuming <code>db</code> is an instance of the <code>RocksDB</code> struct.</p>
<p>Like Python, the use of <code>self</code> as the name of this first parameter of
a struct's methods is purely convention. You can call it whatever.</p>
<p>The point is that we always expect the user to <code>var db = RocksDB.open()</code> for
<code>open()</code> and allow the user to do <code>db.close()</code> for <code>close()</code>.</p>
<p>Let's move on!</p>
<h3 id="setting-a-key-value-pair">Setting a key-value pair</h3><p>We set a pair by calling <code>rocksdb_put</code> with the database instance,
some options (we'll leave to defaults), and the key and value strings
as C strings.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">set</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">writeOptions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_writeoptions_create</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[</span><span class="o">*:</span><span class="mi">0</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_put</span><span class="p">(</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">,</span>
<span class="w"> </span><span class="n">writeOptions</span><span class="p">,</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">len</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">len</span><span class="p">,</span>
<span class="w"> </span><span class="o">&</span><span class="n">err</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">errStr</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">errStr</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>The only special Zig thing is there is <code>key.ptr</code> to satisfy the Zig /
C type system. The type signature <code>key: [:0]const u8</code> and <code>value:
[:0]const u8</code> makes sure that the user passes in a null-delimited
byte slice, which is what the RocksDB API expects.</p>
<h3 id="getting-a-value-from-a-key">Getting a value from a key</h3><p>We set a pair by calling <code>rocksdb_get</code> with the database instance,
some options (we'll again leave to defaults), and the key as a C
string.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">readOptions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_readoptions_create</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">valueLength</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[</span><span class="o">*:</span><span class="mi">0</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_get</span><span class="p">(</span>
<span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">,</span>
<span class="w"> </span><span class="n">readOptions</span><span class="p">,</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span>
<span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">len</span><span class="p">,</span>
<span class="w"> </span><span class="o">&</span><span class="n">valueLength</span><span class="p">,</span>
<span class="w"> </span><span class="o">&</span><span class="n">err</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">errStr</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">errStr</span><span class="p">)</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">valueLength</span><span class="p">],</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>One thing in there to call out is that we can go from a null-delimited
value <code>v</code> to a standard Zig slice <code>[]u8</code> by slicing from <code>0</code> to the
length of the value returned by the C API.</p>
<p>Also, <code>rocksdb_get</code> is only used for getting a single key-value
pair. We'll handle key-value pair iteration next.</p>
<h3 id="iterating-over-key-value-pairs">Iterating over key-value pairs</h3><p>The basic structure of RocksDB's iterator API is that you first create
an iterator instance with <code>rocksdb_create_iterator()</code>. Then you either
<code>rocksdb_iter_seek_to_first()</code> or <code>rocksdb_iter_seek()</code> (with a
prefix) to get the iterator ready. Then you get the current iterator
entry's key with <code>rocksdb_iter_key()</code> and value with
<code>rocksdb_iter_value()</code>. You move on to the next entry in the iterator
with <code>rocksdb_iter_next()</code> and check that the current iterator value
is valid with <code>rocksdb_iter_valid()</code>. When the iterator is no longer
valid, or if you want to stop iterating early, you call
<code>rocksdb_iter_destroy()</code>.</p>
<p>But we'd like to present a Zig-only interface to users of the
<code>RocksDB</code> Zig struct. So we'll create a <code>RocksDB.iter()</code> function that
returns a <code>RocksDB.Iter</code> with an <code>RocksDB.Iter.next()</code> function that
will return an optional <code>RocksDB.IterEntry</code>.</p>
<p>We'll start backwards with that <code>RocksDB.Iter</code> struct.</p>
<h4 id="<code>rocksdb.iter</code>"><code>RocksDB.Iter</code></h4><p>Each iterator instance will store a pointer to a RocksDB iterator
instance. It will store the prefix requested (which is allowed to be
an empty string). If the prefix is set though, we'll only iterate
while the iterator key has the requested prefix.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">IterEntry</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">key</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">Iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">iter</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iterator_t</span><span class="p">,</span>
<span class="w"> </span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">bool</span><span class="p">,</span>
<span class="w"> </span><span class="n">prefix</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span>
<span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">next</span><span class="p">(</span><span class="bp">self</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">Iter</span><span class="p">)</span><span class="w"> </span><span class="err">?</span><span class="n">IterEntry</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="bp">self</span><span class="o">.</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="bp">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_valid</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb nb-Type">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">keySize</span><span class="p">:</span><span class="w"> </span><span class="n">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_key</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">keySize</span><span class="p">);</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Make</span><span class="w"> </span><span class="n">sure</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">still</span><span class="w"> </span><span class="n">within</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">prefix</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prefix</span><span class="o">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prefix</span><span class="o">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">keySize</span><span class="w"> </span><span class="ow">or</span>
<span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">eql</span><span class="p">(</span><span class="n">u8</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="n">self</span><span class="o">.</span><span class="n">prefix</span><span class="o">.</span><span class="n">len</span><span class="p">],</span><span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">prefix</span><span class="p">))</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb nb-Type">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">valueSize</span><span class="p">:</span><span class="w"> </span><span class="n">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_value</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">valueSize</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">IterEntry</span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">key</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="n">keySize</span><span class="p">],</span>
<span class="w"> </span><span class="o">.</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="n">valueSize</span><span class="p">],</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Finally we'll wrap the <code>rocksdb_iter_destroy()</code> method:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Iter</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_iter_destroy</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">iter</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<h4 id="<code>rocksdb.iter()</code>"><code>RocksDB.iter()</code></h4><p>Now we can write the function that creates the <code>RocksDB.Iter</code>. As
previously mentioned we must first instantiate the RocksDB iterator
and then <code>seek</code> to either the first entry if the user doesn't request
a prefix. Or if the user requests a prefix, we <code>seek</code> until that
prefix.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">iter</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Iter</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">readOptions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_readoptions_create</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Iter</span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="p">.</span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">prefix</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_create_iterator</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">,</span><span class="w"> </span><span class="n">readOptions</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">it</span><span class="p">.</span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"Could not create iterator"</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">prefix</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_iter_seek</span><span class="p">(</span>
<span class="w"> </span><span class="n">it</span><span class="p">.</span><span class="n">iter</span><span class="p">,</span>
<span class="w"> </span><span class="n">prefix</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span>
<span class="w"> </span><span class="n">prefix</span><span class="p">.</span><span class="n">len</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_iter_seek_to_first</span><span class="p">(</span><span class="n">it</span><span class="p">.</span><span class="n">iter</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">it</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">};</span>
</pre></div>
<p>And now we're done a basic Zig wrapper for the RocksDB API!</p>
<h3 id="<code>main</code>"><code>main</code></h3><p>Next we write a simple command-line entrypoint that uses the RocksDB
wrapper we built. This is not the prettiest code but it gets the job
done.</p>
<div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">openRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="s">"/tmp/db"</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">openRes</span><span class="p">.</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Failed to open: {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">openRes</span><span class="p">.</span><span class="n">val</span><span class="p">.</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">args</span><span class="p">();</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">key</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"get"</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">arg</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"set"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"set"</span><span class="p">;</span>
<span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">().</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">().</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"get"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"get"</span><span class="p">;</span>
<span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">().</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">"list"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"lst"</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">argNext</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">argNext</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Must specify command (get, set, or list). Got: '{s}'.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">arg</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">command</span><span class="p">,</span><span class="w"> </span><span class="s">"set"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">setErr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">setErr</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Error setting key: {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">command</span><span class="p">,</span><span class="w"> </span><span class="s">"get"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">getRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">getRes</span><span class="p">.</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Error getting key: {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">getRes</span><span class="p">.</span><span class="n">val</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">v</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"{s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">v</span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Key not found.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">key</span><span class="p">;</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iterRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">iter</span><span class="p">(</span><span class="n">prefix</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">iterRes</span><span class="p">.</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"Error getting iterator: {s}.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iterRes</span><span class="p">.</span><span class="n">val</span><span class="p">.</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">iter</span><span class="p">.</span><span class="n">close</span><span class="p">();</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">iter</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">entry</span><span class="o">|</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">"{s} = {s}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">entry</span><span class="p">.</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">entry</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Notably, the <code>main</code> function must be marked <code>pub</code>. The struct and
struct methods we wrote would need to be marked <code>pub</code> if we wanted
them accessible from other files. But since this is a single file,
<code>pub</code> doesn't matter. Except for <code>main</code>.</p>
<p>Now we can get into building.</p>
<h3 id="building">Building</h3><p>First we need to compile the RocksDB library. To do this we simply
<code>git clone</code> RocksDB and run <code>make shared_libs</code>.</p>
<h4 id="compiling-rocksdb">Compiling RocksDB</h4><div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/facebook/rocksdb
$<span class="w"> </span><span class="o">(</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>rocksdb<span class="w"> </span><span class="o">&&</span><span class="w"> </span>make<span class="w"> </span>shared_lib<span class="w"> </span>-j8<span class="w"> </span><span class="o">)</span>
</pre></div>
<p>This may take a while, sorry.</p>
<h4 id="<code>build.zig</code>"><code>build.zig</code></h4><p>Next we need to write a <code>build.zig</code> script that tells Zig about this
external library. This was one of the harder parts of the process, but
building and linking against foreign libraries is almost always hard.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="n">build</span><span class="p">.</span><span class="n">zig</span>
<span class="kr">const</span><span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"builtin"</span><span class="p">).</span><span class="n">zig_version</span><span class="p">;</span>
<span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">build</span><span class="p">(</span><span class="n">b</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">build</span><span class="p">.</span><span class="n">Builder</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">exe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">addExecutable</span><span class="p">(</span><span class="s">"main"</span><span class="p">,</span><span class="w"> </span><span class="s">"main.zig"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkLibC</span><span class="p">();</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkSystemLibraryName</span><span class="p">(</span><span class="s">"rocksdb"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addLibraryPath</span><span class="p">(</span><span class="s">"./rocksdb"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addIncludePath</span><span class="p">(</span><span class="s">"./rocksdb/include"</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">setOutputDir</span><span class="p">(</span><span class="s">"."</span><span class="p">);</span>
<span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">install</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>Felix Queißner's <a href="https://zig.news/xq/zig-build-explained-part-3-1ima">zig build
explained</a> series
was quite helpful.</p>
<p>Now we just:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build
</pre></div>
<p>And run!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./main<span class="w"> </span>list
$<span class="w"> </span>./main<span class="w"> </span><span class="nb">set</span><span class="w"> </span>x<span class="w"> </span><span class="m">12</span>
$<span class="w"> </span>./main<span class="w"> </span><span class="nb">set</span><span class="w"> </span>xy<span class="w"> </span><span class="m">300</span>
$<span class="w"> </span>./main<span class="w"> </span>list
<span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span>
<span class="nv">xy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span>
$<span class="w"> </span>./main<span class="w"> </span>get<span class="w"> </span>xy
<span class="m">300</span>
$<span class="w"> </span>./main<span class="w"> </span>list<span class="w"> </span>xy
<span class="nv">xy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span>
</pre></div>
<p>Not bad!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post on using RocksDB with Zig! There weren't a lot of good examples of the C API and it was good practice for learning Zig.<br><br>Also sets me up for integrating it in a (WIP) port of my toy SQL database from Go to Zig. (This time with storage!)<a href="https://t.co/zquojV974G">https://t.co/zquojV974G</a> <a href="https://t.co/gtAsB6Wrhi">pic.twitter.com/gtAsB6Wrhi</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1586908890960117760?ref_src=twsrc%5Etfw">October 31, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/zigrocks.htmlSun, 30 Oct 2022 00:00:00 +0000
- A database without dynamic memory allocationhttp://notes.eatonphil.com/a-database-without-dynamic-memory.html<head>
<meta http-equiv="refresh" content="4;URL='https://tigerbeetle.com/blog/a-database-without-dynamic-memory/'" />
</head><p>This is an external post of mine. Click
<a href="https://tigerbeetle.com/blog/a-database-without-dynamic-memory/">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/a-database-without-dynamic-memory.htmlWed, 12 Oct 2022 00:00:00 +0000
- A minimal distributed key-value database with Hashicorp's Raft libraryhttp://notes.eatonphil.com/minimal-key-value-store-with-hashicorp-raft.html<p>When I wrote the "<a href="/distributed-postgres.html">build a distributed PostgreSQL proof of
concept</a>" post I first had to figure out
how to use <a href="https://github.com/hashicorp/raft">Hashicorp's Raft
implementation</a>.</p>
<p>There weren't any examples I could find in the Hashicorp repo
itself. And the only example I <em>could</em> find was Philip O'Toole's
<a href="https://github.com/otoolep/hraftd">hraftd</a>. It's great! However, I
have a hard time following multi-file examples in general.</p>
<p>So I built my own <a href="https://github.com/eatonphil/raft-example">single-file
example</a>. It's not perfect
but it helped me get started and may help you too. We'll walk through
that code, ~260 lines of Go, in this post.</p>
<p>The key-value database will only be able to set keys, not delete
them. But it will be able to overwrite existing entries. And it will
expose this distributed key-value database over an HTTP API.</p>
<p>Here's a sample interaction it will be able to support.</p>
<p>Terminal 1:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node1<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2222</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8222</span>
</pre></div>
<p>Terminal 2:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node2<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2223</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8223</span>
</pre></div>
<p>Terminal 3, tell 1 to have 2 follow it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8222/join?followerAddr=localhost:2223&followerId=node2'</span>
</pre></div>
<p>Terminal 3, now add a key:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s1">'localhost:8222/set'</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"key": "x", "value": "23"}'</span><span class="w"> </span>-H<span class="w"> </span><span class="s1">'content-type: application/json'</span>
</pre></div>
<p>Terminal 3, now get the key from either server:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8222/get?key=x'</span>
<span class="o">{</span><span class="s2">"data"</span>:<span class="s2">"23"</span><span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8223/get?key=x'</span>
<span class="o">{</span><span class="s2">"data"</span>:<span class="s2">"23"</span><span class="o">}</span>
</pre></div>
<p>Let's make it happen!</p>
<h3 id="eine-kleine-background">Eine kleine background</h3><p>Raft is an algorithm for managing a replicated (basically append-only)
log over a cluster of nodes. When you combine this with a state
machine you get a stateful, distributed application. Log entries act
as commands for the state machine. When a node in the Raft cluster
crashes, it is brought up to date by sending (also called "replaying")
all commands in the log through the state machine.</p>
<p>This can be made more efficient by implementing an
application-specific concept of state snapshots. But since snapshots
are just an optimization, we'll skip it entirely to keep this
application simple.</p>
<p>If you want the details, just <a href="https://raft.github.io/raft.pdf">read the Raft
paper</a>! It is surprisingly
accessible, especially as a user.</p>
<h3 id="our-app">Our app</h3><p>In our distributed key-value application, commands will be a
serialized struct with a key and a value. The state machine will take
each struct and set the key to the value in memory. Thus after
replaying the entire log (and continuing to apply future log entries),
each node will have an in-memory key-value store that is up to date
with all other nodes in the cluster.</p>
<p>Note that although each node's key-value store will only be in memory,
it will be backed by the durable append-only log! So with, Raft each
in-memory key-value store will still be durable.</p>
<p>Let's get things set up in a file, <code>main.go</code>.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"net"</span>
<span class="w"> </span><span class="s">"net/http"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"path"</span>
<span class="w"> </span><span class="s">"sync"</span>
<span class="w"> </span><span class="s">"time"</span>
<span class="w"> </span><span class="s">"github.com/hashicorp/raft"</span>
<span class="w"> </span><span class="s">"github.com/hashicorp/raft-boltdb"</span>
<span class="p">)</span>
</pre></div>
<h3 id="the-state-machine">The state machine</h3><p>The state machine acts on an in-memory key-value store.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">kvFsm</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span>
<span class="p">}</span>
</pre></div>
<p>There are three operations this Raft library wants us to implement on
our state machine struct.</p>
<h4 id="apply">Apply</h4><p>The Apply operation is sent to basically-up-to-date nodes to keep them
up to date. An Apply call is made for each new log the leader commits.</p>
<p>Each log message will contain a key and value to store in the
in-memory key-value store.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">setPayload</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Key</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">Value</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">log</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Log</span><span class="p">)</span><span class="w"> </span><span class="kt">any</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">LogCommand</span><span class="p">:</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="nx">setPayload</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Data</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">sp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not parse payload: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">sp</span><span class="p">.</span><span class="nx">Key</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown raft log type: %#v"</span><span class="p">,</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Here we're reading a log in a custom format. Later on down in the HTTP
server we'll write the part that submits that log in this custom
format.</p>
<p>The Raft library just cares that logs are (opaque) bytes. Whatever
format works.</p>
<h4 id="restore">Restore</h4><p>The Restore operation reads all logs and applies them to the state
machine.</p>
<p>It looks very similar to the <code>Apply</code> function we just wrote except for
that this operates on an <code>io.ReadCloser</code> of serialized log data rather
than the high-level <code>raft.Log</code> struct.</p>
<p>And most importantly, and unlike the <code>Apply</code> function, <code>Restore</code> must
reset all local state.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Restore</span><span class="p">(</span><span class="nx">rc</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ReadCloser</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Must always restore from a clean state!!</span>
<span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Range</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">key</span><span class="w"> </span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Delete</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="nx">decoder</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">rc</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">decoder</span><span class="p">.</span><span class="nx">More</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="nx">setPayload</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decoder</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&</span><span class="nx">sp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not decode payload: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">sp</span><span class="p">.</span><span class="nx">Key</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">rc</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="p">}</span>
</pre></div>
<p>The <code>io.ReadCloser</code> represents the latest snapshot or the beginning of
time if there are no snapshots.</p>
<h4 id="snapshot">Snapshot</h4><p>We won't implement this. But to satisfy the Go interface we must have
empty some functions.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="w"> </span><span class="kd">struct</span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Persist</span><span class="p">(</span><span class="nx">_</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">SnapshotSink</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Release</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Snapshot</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">FSMSnapshot</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p class="note">
I <em>think</em> this is a correct noop. If we implemented a real
snapshot we'd serialize the current key-value state, and <code>raft.SnapshotSink.Write()</code> it
to the <code>raft.SnapshotSink</code>. That sink, in turn, is what is passed (as
an <code>io.ReadCloser</code>) to the <code>Restore</code> method above.
<br />
<br />
So it must be that when we do not call <code>raft.SnapshotSink.Close()</code>, <a href="https://pkg.go.dev/github.com/hashicorp/raft#FSMSnapshot">as the docs suggest</a>,
no snapshot gets recorded.
<br />
<br />
Since we aren't implementing snapshots, the Raft
library must be doing its own serialization, writing each message's
bytes directly to some sink.
<br />
<br />
If I'm wrong, <a href="mailto:[email protected]">feel free to correct me</a>.
</p><p>That's it for the state machine!</p>
<h3 id="raft-node-initialization">Raft node initialization</h3><p>In order to start the Raft library behavior for each node, we need a
whole bunch of boilerplate for Raft library initialization.</p>
<p>Each Raft node needs a TCP port that it uses to communicate with other
nodes in the same cluster.</p>
<p>Each node starts out in a single-node cluster where it is the
leader. Only when told to (and given the address of other nodes) does
there become a multi-node cluster.</p>
<p>Each node also needs a permanent store for the append-only log. The
Hashicorp Raft library suggests
<a href="https://github.com/hashicorp/raft-boltdb">boltdb</a>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">nodeId</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span>
<span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raftboltdb</span><span class="p">.</span><span class="nx">NewBoltStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">"bolt"</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create bolt store: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewFileSnapshotStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">"snapshot"</span><span class="p">),</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create snapshot store: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">ResolveTCPAddr</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not resolve address: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">transport</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewTCPTransport</span><span class="p">(</span><span class="nx">raftAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="o">*</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create tcp transport: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">raftCfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">DefaultConfig</span><span class="p">()</span>
<span class="w"> </span><span class="nx">raftCfg</span><span class="p">.</span><span class="nx">LocalID</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">)</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewRaft</span><span class="p">(</span><span class="nx">raftCfg</span><span class="p">,</span><span class="w"> </span><span class="nx">kf</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">transport</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create raft instance: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Cluster consists of unjoined leaders. Picking a leader and</span>
<span class="w"> </span><span class="c1">// creating a real cluster is done manually after startup.</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">BootstrapCluster</span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Configuration</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Servers</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ID</span><span class="p">:</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">),</span>
<span class="w"> </span><span class="nx">Address</span><span class="p">:</span><span class="w"> </span><span class="nx">transport</span><span class="p">.</span><span class="nx">LocalAddr</span><span class="p">(),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Now let's dig into how nodes learn about each other.</p>
<h3 id="an-http-api">An HTTP API</h3><p>This key-value store application will have an HTTP API serving two purposes:</p>
<ul>
<li>Cluster management: telling a leader to add followers</li>
<li>Key-value storage: setting and getting keys</li>
</ul>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">httpServer</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span>
<span class="p">}</span>
</pre></div>
<h4 id="cluster-management">Cluster management</h4><p>In this library, the leader is told to add other nodes as its
follower. (This feels backwards to me, but it is what it is!)</p>
<p>For this, the library requires a node ID and its internal TCP port for
Raft messages.</p>
<p>These will both be parameters we give each node later on when the node
process is started.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">hs</span><span class="w"> </span><span class="n">httpServer</span><span class="p">)</span><span class="w"> </span><span class="n">joinHandler</span><span class="p">(</span><span class="n">w</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">followerId</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Query</span><span class="p">()</span><span class="o">.</span><span class="n">Get</span><span class="p">(</span><span class="s2">"followerId"</span><span class="p">)</span>
<span class="w"> </span><span class="n">followerAddr</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Query</span><span class="p">()</span><span class="o">.</span><span class="n">Get</span><span class="p">(</span><span class="s2">"followerAddr"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">hs</span><span class="o">.</span><span class="n">r</span><span class="o">.</span><span class="n">State</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">raft</span><span class="o">.</span><span class="n">Leader</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">json</span><span class="o">.</span><span class="n">NewEncoder</span><span class="p">(</span><span class="n">w</span><span class="p">)</span><span class="o">.</span><span class="n">Encode</span><span class="p">(</span><span class="n">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="n">string</span><span class="w"> </span><span class="err">`</span><span class="n">json</span><span class="p">:</span><span class="s2">"error"</span><span class="err">`</span>
<span class="w"> </span><span class="p">}{</span>
<span class="w"> </span><span class="s2">"Not the leader"</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusText</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">hs</span><span class="o">.</span><span class="n">r</span><span class="o">.</span><span class="n">AddVoter</span><span class="p">(</span><span class="n">raft</span><span class="o">.</span><span class="n">ServerID</span><span class="p">(</span><span class="n">followerId</span><span class="p">),</span><span class="w"> </span><span class="n">raft</span><span class="o">.</span><span class="n">ServerAddress</span><span class="p">(</span><span class="n">followerAddr</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">Error</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">log</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s2">"Failed to add follower: </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="p">)</span>
<span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusText</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">w</span><span class="o">.</span><span class="n">WriteHeader</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusOK</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h4 id="key-value-storage">Key-value storage</h4><p>This part of the HTTP API exposes setting and getting.</p>
<h5 id="set">Set</h5><p>Setting is where, instead of modifying the local database directly, we
pass a message to the Raft cluster to store a log that contains the
key and value.</p>
<p>Since we read log messages in <code>kvFsm.Apply</code> and <code>kvFsm.Restore</code> as a
JSON encoding of the <code>setPayload</code> struct we created, we must write log
messages like so as well. Or, specifically in this case, we just
expect that the user passes a JSON body that matches the <code>setPayload</code>
struct.</p>
<p>Then we call <code>Apply</code> on the Raft instance with the log message to get
this process going.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">setHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ReadAll</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not read key-value in http request: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">future</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">Apply</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="mi">500</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Blocks until completion</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Error</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not write key-value: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Response</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not write key-value, application: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p class="note">
I'm not completely sure if `future.Response()` is supposed to be
called from inside the `future.Error()` error block. You
can <a href="https://pkg.go.dev/github.com/hashicorp/raft#ApplyFuture">see
the docs</a> for yourself.
</p><h5 id="get">Get</h5><p>If we wanted to be completely consistent we would need to pass a
<code>read</code> message through to the Raft cluster and check its result for a
key's value. We'd need to implement that <code>read</code> message in the state
machine.</p>
<p>But if we don't care strongly about consistency for reads we can just
read the local in-memory store, skipping the Raft cluster.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">getHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"key"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Data</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`json:"data"`</span>
<span class="w"> </span><span class="p">}{</span><span class="nx">value</span><span class="p">.(</span><span class="kt">string</span><span class="p">)}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not encode key-value in http response: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for the server!</p>
<h3 id="configuration">Configuration</h3><p>Let's throw together a quick helper for grabbing configuration from
the CLI.</p>
<p>When the process is started, each node must be configured
with a Raft-level TCP address, a Raft-level unique node ID, and an
HTTP address (for our application).</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">httpPort</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">raftPort</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">config</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--node-id"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--http-port"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--raft-port"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --node-id"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --raft-port"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --http-port"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cfg</span>
<span class="p">}</span>
</pre></div>
<p>And finally, the <code>main()</code> that brings it all together.</p>
<h3 id="main">main</h3><div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">kf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">kvFsm</span><span class="p">{</span><span class="nx">db</span><span class="p">}</span>
<span class="w"> </span><span class="nx">dataDir</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"data"</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Could not create data directory: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="s">"raft"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="s">"localhost:"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="p">,</span><span class="w"> </span><span class="nx">kf</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">hs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">{</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="p">}</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">"/set"</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">setHandler</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">"/get"</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">getHandler</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">"/join"</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">joinHandler</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">":"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Build it.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">mod</span><span class="w"> </span><span class="nx">init</span><span class="w"> </span><span class="nx">raft</span><span class="o">-</span><span class="nx">example</span>
<span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">mod</span><span class="w"> </span><span class="nx">tidy</span>
<span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">build</span>
</pre></div>
<p>And give it a shot. :)</p>
<p>Terminal 1:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node1<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2222</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8222</span>
</pre></div>
<p>Terminal 2:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node2<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2223</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8223</span>
</pre></div>
<p>Terminal 3, tell 1 to have 2 follow it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8222/join?followerAddr=localhost:2223&followerId=node2'</span>
</pre></div>
<p>Terminal 3, now add a key:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s1">'localhost:8222/set'</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"key": "x", "value": "23"}'</span><span class="w"> </span>-H<span class="w"> </span><span class="s1">'content-type: application/json'</span>
</pre></div>
<p>Terminal 3, now get the key from either server:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8222/get?key=x'</span>
<span class="o">{</span><span class="s2">"data"</span>:<span class="s2">"23"</span><span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8223/get?key=x'</span>
<span class="o">{</span><span class="s2">"data"</span>:<span class="s2">"23"</span><span class="o">}</span>
</pre></div>
<p>And we're golden!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Following up on that "build a distributed postgres" post I wanted to write down a shorter intro to building a stateful, distributed application using Hashicorp's Raft library.<br><br>So, here's a new blog post!<br><br>Also, try reading the Raft paper! It's not bad 😀<a href="https://t.co/C4S3uzxm0W">https://t.co/C4S3uzxm0W</a> <a href="https://t.co/L3Wwawe0UC">pic.twitter.com/L3Wwawe0UC</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1571662239559716865?ref_src=twsrc%5Etfw">September 19, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/minimal-key-value-store-with-hashicorp-raft.htmlSat, 17 Sep 2022 00:00:00 +0000
- What's the big deal about key-value databases like FoundationDB and RocksDB?http://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html<p>Let's assume you're familiar with basic SQL databases like PostgreSQL
and MySQL, and document databases like MongoDB and Elasticsearch. You
probably know Redis too.</p>
<p>But you're hearing more and more about embedded key-value stores like
<a href="http://rocksdb.org/">RocksDB</a>,
<a href="https://github.com/google/leveldb">LevelDB</a>,
<a href="https://github.com/cockroachdb/pebble">PebbleDB</a>, and so on. And
you're hearing about distributed key-value databases like
<a href="https://www.foundationdb.org/">FoundationDB</a> and
<a href="https://tikv.org/">TiKV</a>.</p>
<p>What's the big deal? Aren't these just the equivalent of Redis or
Java's ConcurrentHashMap?</p>
<p>Let's take a look.</p>
<h3 id="extensible-databases">Extensible databases</h3><p>Over the last 10 years or so (at least), databases have become more
extensible. MySQL has around <a href="https://dev.mysql.com/doc/refman/8.0/en/storage-engines.html">10 different open-source storage
engines</a>. More
surely exist in the wild.</p>
<p>Mongo supports <a href="https://www.mongodb.com/docs/manual/core/storage-engines/">multiple storage
engines</a>. Relatively
late, PostgreSQL version 12 added support for <a href="https://www.postgresql.org/docs/current/tableam.html">pluggable storage
engines</a>.</p>
<p class="note">
<a href="https://github.com/orioledb/orioledb">OrioleDB</a>
and <a href="https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-compression-for-postgres/">Citus
10's columnar compression</a> are two particularly interesting
databases making use of PostgreSQL's pluggable storage engine. But
since neither use an embedded key-value store, I won't talk about
them more in this post.
</p><p>And so on.</p>
<h4 id="but-why?">But why?</h4><p>Swapping out storage engines allows you to tune the performance of
your database. It can allow you to swap out row-oriented storage for
column-oriented storage (useful for analytics workloads).</p>
<p>It can allow you to swap B-Trees (traditional choice) for <a href="http://www.benstopford.com/2015/02/14/log-structured-merge-trees/">LSM
Trees</a>
(new hotness) as the underlying storage method (useful for optimizing
write-heavy workloads).</p>
<p>And since some storage engines themselves are built on distributed
consensus (like <a href="https://github.com/apple/foundationdb">FoundationDB</a>
and <a href="https://github.com/tikv/tikv">TiKV</a>), it may even allow you to
turn a non-distributed database into a distributed database.</p>
<h3 id="mapping-sql-to-key-value-storage">Mapping SQL to key-value storage</h3><p>But how the heck do you turn SQL, row-oriented data, into key-value
data?</p>
<p>CockroachDB is a SQL database built on <a href="https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/">RocksDB originally and now
their own LevelDB-inspired
database</a>
called <a href="https://github.com/cockroachdb/pebble">PebbleDB</a>.</p>
<p>The reason I mention that here is because they maintain a great doc
about <a href="https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/encoding.md">their method of encoding rows to key-value
form</a>.</p>
<p>To simplify that doc though you can imagine mapping each row
to a key-value form like this:</p>
<div class="highlight"><pre><span></span><span class="nx">$</span><span class="p">{</span><span class="nx">TABLE_IDENTIFIER</span><span class="p">}</span><span class="nx">_$</span><span class="p">{</span><span class="nx">PRIMARY_KEY</span><span class="p">}</span><span class="nx">_$</span><span class="p">{</span><span class="nx">ROW_IDENTIFIER</span><span class="p">}</span><span class="o">:</span><span class="w"> </span><span class="nx">$</span><span class="p">{</span><span class="nx">ENCODED_VALUE</span><span class="p">}</span>
</pre></div>
<p>Embedded key-value stores almost always support efficient scanning of
rows by a key-prefix. This means that you can efficiently grab all rows
within a table by prefix-scanning on the table identifier. If you also
include a primary key value along with the table identifier prefix,
you get efficient primary key lookup.</p>
<p>Even though the key space is flat.</p>
<p>For the encoded value you can pick any encoding scheme; as
inefficient as JSON or as efficient as some binary scheme like
Protocol Buffers or Parquet.</p>
<p class="note">
Thanks to <a href="https://twitter.com/justinjaffray">Justin Jaffray</a> for
pointing me at the CockroachDB doc and confirming some of my thoughts
on encoding strategies.
</p><h4 id="tutorials">Tutorials</h4><p>I've written a couple of tutorials on building a database. They build on
top of embedded key-value stores. If you're interested in seeing
minimal code walkthroughs of how this process can work, check these
posts out:</p>
<ul>
<li><a href="https://notes.eatonphil.com/distributed-postgres.html">Let's build a distributed Postgres proof of concept</a></li>
<li><a href="https://notes.eatonphil.com/documentdb.html">Writing a document database from scratch in Go: Lucene-like filters and indexes</a></li>
</ul>
<h3 id="major-aspects-of-key-value-stores">Major aspects of key-value stores</h3><p>Now that you understand how a database can map to a key-value store,
let's take a look at the particular properties that distinguish all
these key-value stores from systems like Redis and Memcached.</p>
<h4 id="reliable-storage">Reliable storage</h4><p>Maybe the single most important thing a storage system does is actual
store data reliably. You can't just <code>open()</code> and <code>write()</code>. To quote
Dan Luu, <a href="https://danluu.com/file-consistency/">files are
hard</a>.</p>
<p>Deferring storage correctness to a dedicated system means database
developers can worry about other aspects of database development.</p>
<h4 id="embeddable">Embeddable</h4><p>Along with reliable storage is the fact that the storage needs to run
in process. Redis, for example, is not embeddable. There are many
other things on top of the storage that need to happen in a high-level
database and RPC calls between processes for storage is an unnecessary
overhead.</p>
<h4 id="efficient-prefix-scans">Efficient prefix scans</h4><p>As mentioned above, support for scans is pretty important for how
indexes and namespaces (tables in SQL) get mapped to key-value
queries.</p>
<p>You shouldn't need to look through all table rows in the flat key space to
find the rows for one table.</p>
<h4 id="additional-aspects">Additional aspects</h4><p>The above isn't a complete list. Different stores provide different
useful aspects like improved performance on certain workloads/in
certain environments, builtin transactions, and so on.</p>
<p>And sometimes it's helpful just to have an embedded store in your
language rather than going through a foreign-function interface.</p>
<h3 id="survey-of-databases-built-on-embedded-key-value-stores">Survey of databases built on embedded key-value stores</h3><p>Lastly, let's take a look at a few databases that build on top of
embedded key-value stores.</p>
<p>Note that some of them are not the primary version of the database
(e.g. MyRocks vs MySQL, MongoRocks vs Mongo). Some of them are the
primary version (e.g. CockroachDB, YugabyteDB).</p>
<h4 id="document-databases-built-on-key-value-stores">Document databases built on key-value stores</h4><ul>
<li><a href="https://www.percona.com/doc/percona-server-for-mongodb/3.4/mongorocks.html">MongoRocks</a> (Mongo on RocksDB)</li>
</ul>
<h4 id="sql-databases-built-on-key-value-stores">SQL databases built on key-value stores</h4><ul>
<li><a href="http://myrocks.io/">MyRocks</a> (MySQL on RocksDB)</li>
<li><a href="https://www.cockroachlabs.com">CockroachDB</a> (RocksDB originally, now their own PebbleDB)</li>
<li><a href="https://www.yugabyte.com/blog/how-we-built-a-high-performance-document-store-on-rocksdb/">YugabyteDB</a> (on DocDB on RocksDB)</li>
<li><a href="https://www.gridgain.com/resources/blog/apache-ignite-3-alpha-3-apache-calcite-raft-and-lsm-tree">Apache Ignite</a> (Calcite on RocksDB)</li>
</ul>
<h4 id="redis-compatible-databases-built-on-key-value-stores">Redis-compatible databases built on key-value stores</h4><ul>
<li><a href="https://engineering.fb.com/2021/08/06/core-data/zippydb/">ZippyDB</a> (Redis-compatible database on RocksDB)</li>
<li><a href="https://redis.com/blog/hood-redis-enterprise-flash-database-architecture/">Redis Enterprise Flash</a> (Redis on RocksDB)</li>
</ul>
<h4 id="other-databases-built-on-key-value-stores">Other databases built on key-value stores</h4><ul>
<li><a href="https://thenewstack.io/instagram-supercharges-cassandra-pluggable-rocksdb-storage-engine/">Rocksandra</a> (Cassandra on RocksDB)</li>
</ul>
<p>Missing a database? Let me know!</p>
<h4 id="separately,-distributed-key-value-stores">Separately, distributed key-value stores</h4><p>There is a different kind of key-value store that is a standalone app
designed for distributed data. This list includes
<a href="https://www.consul.io/">Consul</a>,
<a href="https://etcd.io/docs/v3.4/learning/why/">etcd</a>, likely
<a href="https://www.foundationdb.org/">FoundationDB</a>, and likely
<a href="https://engineering.fb.com/2021/08/06/core-data/zippydb/">ZippyDB</a>. (There's
a nice comparison table about some of these databases on the etcd
page).</p>
<p>These systems are designed to be used sort of like Redis except for
that they are persistant and reliable stores. They are designed to
always be up and always correct. For that reason they form the data
storage backbone of core infrastructure like Kubernetes.</p>
<p>It is possibly how <a href="https://www.snowflake.com/blog/how-foundationdb-powers-snowflake-metadata-forward/">Snowflake uses
FoundationDB</a>
but I'm not 100% sure.</p>
<p>TiKV is not an embedded key-value database but it's not being used the
same way etcd/Consul are as far as I can tell. It forms the backbone
of <a href="https://en.pingcap.com/">TiDB</a>, an HTAP (hybrid OLAP/OLTP) SQL
database.</p>
<p>Maybe FoundationDB and TiKV deserve their own new category.</p>
<p>But in general these databases have an RPC API that you communicate
with over TCP. They are not generally embedded. You manage their
process(es) separately.</p>
<h3 id="conclusion">Conclusion</h3><p>So in this post we saw that databases are extensible. Storage engines
are often swappable. Dedicated embedded key-value stores allow
database developers to hand off data storage to a dedicated
library. Different key-value stores have different performance
characteristics that help developers and operators tune a database for
their workload.</p>
<p>Embedded key-value stores are a great foundation for all kinds of
databases; SQL databases like CockroachDB, document databases like
Mongo, wide-store databases like Cassandra, and caching databases like
ZippyDB or Redis Enterprise Flash.</p>
<p>This is a complex topic with many, many variations of
systems. Hopefully this was a useful introduction.</p>
<p>Overall if you're not a database developer and you're not running
databases at a massive scale, you can probably ignore the details of
the storage layer.</p>
<p>Did I get something wrong? Or miss something important? Let me
know. :)</p>
<h3 id="corrections">Corrections</h3><ul>
<li>An earlier version of this post suggested that FoundationDB was
embedded. It is not. Thanks <a href="https://lobste.rs/s/avljlh/what_s_big_deal_about_embedded_key_value#c_rx0oid">adaszko on Lobsters for
correcting</a>.</li>
<li>An earlier version of this post suggested that TiKV was embedded. It is not. Thanks <a href="https://news.ycombinator.com/user?id=eis">eis on Hacker News</a>.</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">What's the big deal about embedded key-value databases like FoundationDB ands RocksDB?<br><br>I wrote a new blog post that might give a little context. :)<a href="https://t.co/kNFM1hVGx6">https://t.co/kNFM1hVGx6</a> <a href="https://t.co/H4SouStZHk">pic.twitter.com/H4SouStZHk</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1562106582544039937?ref_src=twsrc%5Etfw">August 23, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.htmlTue, 23 Aug 2022 00:00:00 +0000
- SQLite has pretty limited builtin functionshttp://notes.eatonphil.com/2022-08-21-sqlite-limited-builtin-functions.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-08-21-sqlite-limited-builtin-functions.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-08-21-sqlite-limited-builtin-functions.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2022-08-21-sqlite-limited-builtin-functions.htmlSun, 21 Aug 2022 00:00:00 +0000
- Container scheduling strategies for integration testing 14 different databases in Github Actionshttp://notes.eatonphil.com/2022-07-25-database-integration-testing.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-07-25-database-integration-testing.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-07-25-database-integration-testing.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2022-07-25-database-integration-testing.htmlMon, 25 Jul 2022 00:00:00 +0000
- Implementing a simple jq clone in Go, and basics of Go memory profilinghttp://notes.eatonphil.com/implementing-a-jq-clone-in-go.html<p>In this post we'll build a basic jq clone in Go. It will only be able
to pull a single path out of each object it reads. It won't be able to
do filters, mapping, etc.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>head<span class="w"> </span>-n2<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span>
<span class="s2">"https://api.github.com/repos/petroav/6.828"</span>
<span class="s2">"https://api.github.com/repos/rspt/rspt-theme"</span>
</pre></div>
<p>We'll start by building a "control" implementation that uses Go's
builtin JSON library with a JSON path tool on top.</p>
<p>Then we'll implement a basic path-aware JSON parser in 600 lines of
Go. It's going to use a technique (that may have a better name but) I
call "partial parsing" or "fuzzy parsing" where we fully parse what we
care about and only <em>sort of</em> parse the rest.</p>
<p>Why partial parsing? There are two general reasons. One is to use
less memory than parsers that must always turn all of a text into an
object in your language. The other is for when the language has
complexities you don't want or need to deal with. We'll basically have
to deal with all the complexities of JSON so this post is about the
former reason: using less memory. I've written about a case for the
second reason though in <a href="https://datastation.multiprocess.io/blog/2021-10-31-building-a-nested-css-rule-expander.html">building a simple, fast SCSS
implementation</a>.</p>
<p class="note">
This partial parser is more complex than a typical handwritten
parser. If you are unfamiliar with handwritten JSON parsers, you may
want to take a look
at <a href="https://notes.eatonphil.com/tags/json.html">previous
articles</a> I've written about parsing JSON.
</p><p>Once we get this partial parser working we'll turn to Go's builtin
profiler to find what we can do to make it faster.</p>
<p>All code for this post is <a href="https://github.com/eatonphil/jqgo">available on
Github</a>.</p>
<h3 id="machine-specs,-versions">Machine specs, versions</h3><p>Since we're going to be doing some rudimentary comparisons of
performance, here are my details. I am running everything on a
dedicated server, <a href="https://us.ovhcloud.com/bare-metal/rise/rise-1/">OVH
Rise-1</a>.</p>
<ul>
<li>RAM: 64 GB DDR4 ECC 2,133 MHz</li>
<li>Disk: 2x450 GB SSD NVMe in Soft RAID</li>
<li>Processor: Intel Xeon E3-1230v6 - 4c/8t - 3.5 GHz/3.9 GHz</li>
</ul>
<p>And relevant versions:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>jq<span class="w"> </span>--version
jq-1.6
$<span class="w"> </span>go<span class="w"> </span>version
go<span class="w"> </span>version<span class="w"> </span>go1.18<span class="w"> </span>linux/amd64
$<span class="w"> </span>uname<span class="w"> </span>-a
Linux<span class="w"> </span>phil<span class="w"> </span><span class="m">5</span>.18.10-100.fc35.x86_64<span class="w"> </span><span class="c1">#1 SMP PREEMPT_DYNAMIC Thu Jul 7 17:41:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux</span>
</pre></div>
<p>Now buckle up!</p>
<h3 id="jq-using-go's-builtin-json-library">jq using Go's builtin JSON library</h3><p>This is a very simple program. We just parse JSON data from stdin in a
loop. And after parsing each time we'll call a <code>extractValueAtPath</code>
function to grab the value at the path the user asks for.</p>
<p>To keep our path "parser" very simple we'll treat array access the
same as object access. So we'll look for <code>x.0</code> instead of <code>x[0]</code>,
unlike jq.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"strconv"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">extractValueAtPath</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">"."</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdout</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&</span><span class="nx">a</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">extractValueAtPath</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">v</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Then we implement the <code>extractValueAtPath</code> function itself,
entering into JSON arrays and objects until we reach the end of the
path.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">extractValueAtPath</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="kt">any</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">a</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arr</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">v</span><span class="p">.([]</span><span class="kt">any</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">part</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">arr</span><span class="p">[</span><span class="nx">n</span><span class="p">]</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">m</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">v</span><span class="p">.(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Path into a non-map</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">m</span><span class="p">[</span><span class="nx">part</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Path does not exist</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Alright, let's give it a go module and build and run it!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>control
$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
$<span class="w"> </span>go<span class="w"> </span>build
<span class="c1"># Grab a test file</span>
$<span class="w"> </span>curl<span class="w"> </span>https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span>-c<span class="w"> </span><span class="s1">'.[]'</span><span class="w"> </span>><span class="w"> </span>large-file.json
$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>head<span class="w"> </span>-n2<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control<span class="w"> </span><span class="s1">'.repo.url'</span>
<span class="s2">"https://api.github.com/repos/petroav/6.828"</span>
<span class="s2">"https://api.github.com/repos/rspt/rspt-theme"</span>
</pre></div>
<p>Sweet. Now let's make sure it produces the same thing as jq.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>control.test
$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jq.test
$<span class="w"> </span>diff<span class="w"> </span>jq.test<span class="w"> </span>control.test
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">0</span>
</pre></div>
<p>Great! It's working for a basic query. Let's see how it performs.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./control '.repo.url' > control.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | jq '.repo.url' > jq.test"</span>
Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>control.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">310</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">14</span>.4<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">296</span>.2<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">49</span>.3<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">296</span>.1<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">344</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jq.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">355</span>.8<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.1<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">348</span>.8<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">27</span>.7<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">354</span>.8<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">358</span>.5<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Summary
<span class="w"> </span><span class="s1">'cat large-file.json | ./control '</span>.repo.url<span class="s1">' > control.test'</span><span class="w"> </span>ran
<span class="w"> </span><span class="m">1</span>.15<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.05<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | jq '</span>.repo.url<span class="s1">' > jq.test'</span>
</pre></div>
<p>Now that's surprising! This naive implementation in Go is a bit faster
than standard jq. But our implementation supports a heck of a lot less
than jq. So this benchmark on its own isn't incredibly meaningful.</p>
<p>However, it's a good base for comparing to our next implementation.</p>
<p class="note">
Astute readers may notice that this version doesn't use a buffered
reader from stdin, while the next version will. I tried this version
with and without wrapping stdin in a buffered reader but it didn't
make a meaningful difference. It might be because Go's JSON decoder
does its own buffering. I'm not sure.
</p><p>Let's do the fun implementation.</p>
<h3 id="partial-parsing">Partial parsing</h3><p>Unlike a typical handwritten parser this partial parser is going to
contain almost two parsers. One parser will care exactly about the
structure of JSON. The other parser will only care about reading past
the current value (whether it be a number or string or array or
object, etc.) The path we pass to the parser will be used to decide
whether each value should be fully parsed or partially parsed.</p>
<p class="note">
I'll reiterate: this partial parser is more complex than a typical
handwritten parser. If you are unfamiliar with handwritten JSON
parsers, you may want to take a look
at <a href="https://notes.eatonphil.com/tags/json.html">previous
articles</a> I've written about parsing JSON.
</p><p>The shell of this partial parser is going to look similar to the shell
of the first parser.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bufio"</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"strconv"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">jsonReader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">read</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="p">}</span>
<span class="o">...</span><span class="w"> </span><span class="nx">TO</span><span class="w"> </span><span class="nx">IMPLEMENT</span><span class="w"> </span><span class="o">...</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">"."</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span>
<span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdout</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">jr</span><span class="w"> </span><span class="nx">jsonReader</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="kt">any</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">reset</span><span class="p">()</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">extractDataFromJsonPath</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Read"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="p">))</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalln</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">val</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalln</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Except instead of using the builtin JSON parser we'll call our own
<code>extractDataFromJsonPath</code> function that handles parsing and extraction
all at once.</p>
<p>Before doing that we'll add a few helper functions. The first one grabs
a byte from a reader and stores the read byte locally (so we can print
out all read bytes if the program fails).</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">jsonReader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">read</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">ReadByte</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>The <code>reset</code> member zeroes out the <code>read</code> bytes and gets called before
each object is parsed in the <code>main</code> main loop.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">reset</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Now let's get into <code>extractDataFromJsonPath</code>.</p>
<h3 id="extractdatafromjsonpath">extractDataFromJsonPath</h3><p>This is the real parser. It expects a JSON object and fully parses the
object, almost.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">extractDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Make sure we're actually going into an object</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'{'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected opening curly brace, got: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="kt">any</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// We found the end of the object</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'}'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Key-value pairs must be separated by commas</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">','</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected comma between key-value pairs, got: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Grab the key</span>
<span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Find a colon separating key from value</span>
<span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">':'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected colon, got: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Up to this point it looks like any old handwritten parser. There are a
few helpers in there (<code>eatWhitespace</code>, <code>expectString</code>) we'll implement
shortly.</p>
<p>But once we see each key and are ready to look for a value we can
decide if we need to fully parse the value (if the path goes into this
key) or if we can partially parse the value (because the path does not
go into this key).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// If the key is not the start of this path, skip past this value</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatValue</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Otherwise this is a path we want, grab the value</span>
<span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectValue</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And that's it! The core parsing loop is done. The meat now becomes 1)
the <code>eatValue</code> function that partially parses JSON and 2) the
<code>expectValue</code> function that either encounters a scalar value and
returns it or recursively calls <code>extractDataFromJsonPath</code> to enter some new object.</p>
<h4 id="notes-on-helper-naming">Notes on helper naming</h4><p>There are three main kinds of helpers you'll see. <code>expectX</code> helpers
like <code>expectString</code> will return early with an error if they fail to
find what they're looking for. <code>eatX</code> helpers like <code>eatWhitespace</code>
will not return any value and will only move the read cursor
forward. And <code>tryX</code> helpers like <code>tryNumber</code> will do the same thing as
<code>expectString</code> but return an additional boolean argument. So the
caller can decide whether or not to make other attempts at parsing.</p>
<p>But first let's fill in the two helpers we skipped. First off, <code>eatWhitespace</code>.</p>
<h3 id="eatwhitespace">eatWhitespace</h3><p>This function peeks and reads bytes while the bytes are whitespace.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="nx">isWhitespace</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">' '</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\t'</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\r'</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isWhitespace</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>That's it! Next we need to fill in <code>expectString</code>.</p>
<h3 id="expectstring">expectString</h3><p>This is a standard handwritten parser helper that looks for a
double quote and keeps collecting bytes until it finds an ending
double quote that is not escaped.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">expectString</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Look</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">opening</span><span class="w"> </span><span class="n">quote</span>
<span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">"Expected double quote to start string, got: '</span><span class="si">%s</span><span class="s2">'"</span><span class="p">,</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="n">byte</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Just</span><span class="w"> </span><span class="n">skip</span>
<span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Overwrite</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">escaped</span><span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">quote</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">s</span><span class="p">[</span><span class="n">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'"'</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Otherwise</span><span class="w"> </span><span class="n">it</span><span class="s1">'s the actual end</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">)</span>
<span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">s</span><span class="p">),</span><span class="w"> </span><span class="n">nil</span>
<span class="p">}</span>
</pre></div>
<p>Standard stuff! Now let's get back to those meaty functions we
introduced before, starting with <code>expectValue</code>.</p>
<h3 id="expectvalue">expectValue</h3><p>This function is called by <code>extractDataFromJsonPath</code> when it wants to
fully parse a value.</p>
<p>If we see a left curly brace, we call <code>extractDataFromJsonPath</code> with
it.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">expectValue</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'{'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">extractDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span>
</pre></div>
<p>Otherwise if we see a left bracket we call a new helper
<code>extractArrayDataFromJsonPath</code> which will be almost identical to
<code>extractDataFromJsonPath</code> but for parsing array syntax.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'['</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">extractArrayDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>If the value we're trying to parse isn't an array or object and
there's more of a path then we have to return null because we can't
enter into a scalar value.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Can't go any further into a path</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="w"> </span>!<span class="p">=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Reached the end of this object but more of</span>
<span class="w"> </span><span class="c1">// the path remains. So this object doesn't</span>
<span class="w"> </span><span class="c1">// contain this path.</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then we try to parse a scalar (numbers, strings, booleans, <code>null</code>) and
ultimately return an error if nothing worked.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ok</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">tryScalar</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected scalar, got: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
</pre></div>
<p>Let's implement <code>tryScalar</code> and its dependencies now. And we'll come
back to <code>extractArrayDataFromJsonPath</code> afterward.</p>
<h3 id="tryscalar">tryScalar</h3><p>The <code>tryScalar</code> is similar to <code>expectValue</code>. It's called <code>tryScalar</code>
because it's allowed to fail.</p>
<p>We peek at the first byte and switch on a dedicated parsing helper
based on it.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">tryScalar</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">val</span><span class="p">),</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'t'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="s">"true"</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'f'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="s">"false"</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'n'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="s">"null"</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">tryNumber</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>This passes control flow to two new functions, <code>expectIdentifier</code> and
<code>tryNumber</code>. Let's do <code>expectIdentifier</code> next.</p>
<h3 id="expectidentifier">expectIdentifier</h3><p>This function tries to match the reader on a string passed to it.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">ident</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">ident</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">ReadByte</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ident</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown value: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p class="note">
Thanks <a href="https://twitter.com/deliberatecoder">Michael
Lynch</a> for pointing out in an earlier version that
<code>expectIdentifier</code> does not need to <code>Peek</code>/<code>Discard</code> but can just
<code>ReadByte</code> instead.
</p><h3 id="trynumber">tryNumber</h3><p>This function tries to parse a number. We'll do a very lazy number
parser that will <em>most likely</em> allow all valid numbers. Internally
we'll call <code>json.Unmarshal</code> on the bytes we build up to do the
conversion itself.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">tryNumber</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb nb-Type">bool</span><span class="p">,</span><span class="w"> </span><span class="n">any</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Loop</span><span class="w"> </span><span class="n">trying</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">find</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">number</span><span class="o">-</span><span class="n">like</span><span class="w"> </span><span class="n">characters</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">row</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">bs</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="n">isNumberCharacter</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'0'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s1">'9'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'e'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'-'</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">isNumberCharacter</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">number</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">)</span>
<span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">number</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="n">float64</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">number</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">n</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">true</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="p">}</span>
</pre></div>
<p>If we can't find a number, that's ok. We'll just say so in the first
argument by returning <code>false</code>.</p>
<h3 id="outstanding-functions">Outstanding functions</h3><p>Ok we've come a while building out helper functions. The last two
remaining helpers are <code>extractArrayDataFromJsonPath</code> and
<code>eatValue</code>. Let's finish up these real parser functions before getting
to <code>eatValue</code>, the primary partial parsing function.</p>
<h3 id="extractarraydatafromjsonpath">extractArrayDataFromJsonPath</h3><p>This function is almost identical to <code>extractDataFromJsonPath</code> but
rather than parsing key-value pairs inside curly braces it parses
values inside brackets.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">extractArrayDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Path inside an array must be an integer</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for opening bracket. Make sure we're in an array</span>
<span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'['</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected opening bracket, got: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="kt">any</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// Found closing bracket, exit the array</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">']'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Array values must be separated by a comma</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">','</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected comma between key-value pairs, got: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Just like <code>extractDataFromJsonPath</code> it either calls <code>eatValue</code> or
<code>expectValue</code> depending on whether the current index matches the
requested path.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// If the key is not the start of this path, skip past this value</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatValue</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectValue</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>That's it for full parser functions! Let's do the partial parser,
<code>eatValue</code>.</p>
<h3 id="eatvalue">eatValue</h3><p>This function is simpler than the full parser functions we wrote
before.</p>
<p>First off it looks for the simple case where the value is a scalar.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">eatValue</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span>
<span class="w"> </span><span class="n">inString</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="bp">false</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="n">byte</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">ok</span><span class="p">,</span><span class="w"> </span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">tryScalar</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">It</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">scalar</span><span class="p">,</span><span class="w"> </span><span class="n">we</span><span class="s1">'re done!</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>All it does is read until the value ends.</p>
<p>If the value is not a scalar though we need to read past complete JSON
arrays and/or objects.</p>
<p>To do this we'll simply read through bytes, monitoring a stack of
open and close braces and brackets. If we enter a string we'll skip
all bytes inside the string until the string ends.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Otherwise it's an array or object</span>
<span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'"'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Two \\-es cancel eachother out</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'['</span><span class="p">:</span>
<span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">']'</span><span class="p">:</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'['</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unexpected end of array: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'{'</span><span class="p">:</span>
<span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'}'</span><span class="p">:</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'{'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unexpected end of object: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'"'</span><span class="p">:</span>
<span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="c1">// Closing quote case handled elsewhere, above</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And we're finally done the first pass of the path-aware jq
implementation.</p>
<h3 id="build,-test,-benchmark">Build, test, benchmark</h3><p>Let's give it a go module, build and test it.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>jqgo
$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>curl<span class="w"> </span>https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span>-c<span class="w"> </span><span class="s1">'.[]'</span><span class="w"> </span>><span class="w"> </span>large-file.json
$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jqgo.test
$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jq.test
$<span class="w"> </span>diff<span class="w"> </span>jq.test<span class="w"> </span>jqgo.test
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">0</span>
</pre></div>
<p>Great! :) Let's benchmark it against jq and the control
implementation.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span>--warmup<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./control/control '.repo.url' > control.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./jqgo '.repo.url' > jqgo.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | jq '.repo.url' > jq.test"</span>
Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control/control<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>control.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">302</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">3</span>.4<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">283</span>.7<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">53</span>.1<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">297</span>.4<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">309</span>.0<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jqgo.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">258</span>.8<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">230</span>.3<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">47</span>.6<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">256</span>.3<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">262</span>.6<span class="w"> </span>ms<span class="w"> </span><span class="m">11</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">3</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jq.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">357</span>.6<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">350</span>.0<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">28</span>.3<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">355</span>.0<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">362</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Summary
<span class="w"> </span><span class="s1">'cat large-file.json | ./jqgo '</span>.repo.url<span class="s1">' > jqgo.test'</span><span class="w"> </span>ran
<span class="w"> </span><span class="m">1</span>.17<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | ./control/control '</span>.repo.url<span class="s1">' > control.test'</span>
<span class="w"> </span><span class="m">1</span>.38<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | jq '</span>.repo.url<span class="s1">' > jq.test'</span>
</pre></div>
<p>Now to my surprise we're already beating the non-path-aware control
implementation! When I first wrote the path-aware version, it was
slower than the control. So I had to start performance profiling. For
this blog post I tried to remake the slowest variation I could
remember but I couldn't get it slower than this.</p>
<p>That said, the best version <em>was</em> faster than this so I <em>can</em>
demonstrate the process of profiling to improve performance.</p>
<p>Let's dig in. :)</p>
<h3 id="profiling-in-go">Profiling in Go</h3><p>There are various ways to enable profiling in Go. One way some people
recommend is through the <a href="https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go">builtin benchmark
support</a>
in <code>go test</code>. I don't really like this method though. I prefer to use
<a href="https://github.com/pkg/profile">pkg/profile</a> manually in <code>main.go</code>.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -9,6 +9,8 @@</span>
<span class="w"> </span> "os"
<span class="w"> </span> "strconv"
<span class="w"> </span> "strings"
<span class="gi">+</span>
<span class="gi">+ "github.com/pkg/profile"</span>
<span class="w"> </span>)
<span class="w"> </span>type jsonReader struct {
<span class="gu">@@ -450,6 +452,7 @@</span>
<span class="w"> </span>}
<span class="w"> </span>func main() {
<span class="gi">+ defer profile.Start().Stop()</span>
<span class="w"> </span> path := strings.Split(os.Args[1], ".")
<span class="w"> </span> if path[0] == "" {
<span class="w"> </span> path = path[1:]
</pre></div>
<p>Build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>/dev/null
<span class="m">2022</span>/07/11<span class="w"> </span><span class="m">02</span>:38:57<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>enabled,<span class="w"> </span>/tmp/profile3691177944/cpu.pprof
<span class="m">2022</span>/07/11<span class="w"> </span><span class="m">02</span>:38:58<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>disabled,<span class="w"> </span>/tmp/profile3691177944/cpu.pprof
</pre></div>
<p>Go can <a href="https://www.honeycomb.io/blog/golang-observability-using-the-new-pprof-web-ui-to-debug-memory-usage/">run a web
server</a>
to visualize the pprof results but I find (after literally a few years
of trying to figure it out) the CLI makes more sense to me.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">tool</span><span class="w"> </span><span class="nx">pprof</span><span class="w"> </span><span class="o">/</span><span class="nx">tmp</span><span class="o">/</span><span class="nx">profile3691177944</span><span class="o">/</span><span class="nx">cpu</span><span class="p">.</span><span class="nx">pprof</span>
<span class="nx">File</span><span class="p">:</span><span class="w"> </span><span class="nx">jqgo</span>
<span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">cpu</span>
<span class="nx">Time</span><span class="p">:</span><span class="w"> </span><span class="nx">Jul</span><span class="w"> </span><span class="mi">11</span><span class="p">,</span><span class="w"> </span><span class="mi">2022</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="mi">2</span><span class="p">:</span><span class="mi">38</span><span class="nx">am</span><span class="w"> </span><span class="p">(</span><span class="nx">UTC</span><span class="p">)</span>
<span class="nx">Duration</span><span class="p">:</span><span class="w"> </span><span class="mf">401.63</span><span class="nx">ms</span><span class="p">,</span><span class="w"> </span><span class="nx">Total</span><span class="w"> </span><span class="nx">samples</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">270</span><span class="nx">ms</span><span class="w"> </span><span class="p">(</span><span class="mf">67.23</span><span class="o">%</span><span class="p">)</span>
<span class="nx">Entering</span><span class="w"> </span><span class="nx">interactive</span><span class="w"> </span><span class="nx">mode</span><span class="w"> </span><span class="p">(</span><span class="kd">type</span><span class="w"> </span><span class="s">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">commands</span><span class="p">,</span><span class="w"> </span><span class="s">"o"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">options</span><span class="p">)</span>
<span class="p">(</span><span class="nx">pprof</span><span class="p">)</span>
</pre></div>
<p>Now we run <code>top10</code> to see where we spend the bulk of time.</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nx">pprof</span><span class="p">)</span><span class="w"> </span><span class="nx">top10</span>
<span class="nx">Showing</span><span class="w"> </span><span class="nx">nodes</span><span class="w"> </span><span class="nx">accounting</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span><span class="w"> </span><span class="nx">total</span>
<span class="nx">Showing</span><span class="w"> </span><span class="nx">top</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="nx">nodes</span><span class="w"> </span><span class="nx">out</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="mi">31</span>
<span class="w"> </span><span class="nx">flat</span><span class="w"> </span><span class="nx">flat</span><span class="o">%</span><span class="w"> </span><span class="nx">sum</span><span class="o">%</span><span class="w"> </span><span class="nx">cum</span><span class="w"> </span><span class="nx">cum</span><span class="o">%</span>
<span class="w"> </span><span class="mi">90</span><span class="nx">ms</span><span class="w"> </span><span class="mf">34.62</span><span class="o">%</span><span class="w"> </span><span class="mf">34.62</span><span class="o">%</span><span class="w"> </span><span class="mi">230</span><span class="nx">ms</span><span class="w"> </span><span class="mf">88.46</span><span class="o">%</span><span class="w"> </span><span class="nx">main</span><span class="p">.(</span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">).</span><span class="nx">eatValue</span>
<span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mf">23.08</span><span class="o">%</span><span class="w"> </span><span class="mf">57.69</span><span class="o">%</span><span class="w"> </span><span class="mi">70</span><span class="nx">ms</span><span class="w"> </span><span class="mf">26.92</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Peek</span>
<span class="w"> </span><span class="mi">50</span><span class="nx">ms</span><span class="w"> </span><span class="mf">19.23</span><span class="o">%</span><span class="w"> </span><span class="mf">76.92</span><span class="o">%</span><span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mf">23.08</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Discard</span>
<span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mf">7.69</span><span class="o">%</span><span class="w"> </span><span class="mf">84.62</span><span class="o">%</span><span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mf">7.69</span><span class="o">%</span><span class="w"> </span><span class="nx">syscall</span><span class="p">.</span><span class="nx">Syscall</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mf">88.46</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Buffered</span><span class="w"> </span><span class="p">(</span><span class="nx">inline</span><span class="p">)</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mf">92.31</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">main</span><span class="p">.(</span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">).</span><span class="nx">readByte</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mf">96.15</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">slicebytetostring</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">stkbucket</span>
<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">0</span><span class="o">%</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">fill</span>
<span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">0</span><span class="o">%</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">encoding</span><span class="o">/</span><span class="nx">json</span><span class="p">.(</span><span class="o">*</span><span class="nx">Encoder</span><span class="p">).</span><span class="nx">Encode</span>
</pre></div>
<p>Now this is weird. Why are <code>Peek</code> and <code>Discard</code> so expensive? And why
are we spending so much time in <code>syscall.Syscall</code>? The entire point of
buffered I/O is to avoid hitting syscalls too frequently.</p>
<p>But since 88% of time is spent in <code>eatValue</code>, let's verify where in
<code>eatValue</code> we are spending that time.</p>
<p>Within the <code>pprof</code> REPL we can enter <code>list X</code> where <code>X</code> is a regexp of
a function name.</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nx">pprof</span><span class="p">)</span><span class="w"> </span><span class="nx">list</span><span class="w"> </span><span class="nx">eatValue</span>
<span class="nx">Total</span><span class="p">:</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span>
<span class="nx">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span><span class="nx">main</span><span class="p">.(</span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">).</span><span class="nx">eatValue</span><span class="w"> </span><span class="nx">in</span><span class="w"> </span><span class="o">/</span><span class="nx">home</span><span class="o">/</span><span class="nx">phil</span><span class="o">/</span><span class="nx">tmp</span><span class="o">/</span><span class="nx">jqgo</span><span class="o">/</span><span class="nx">mainprof</span><span class="p">.</span><span class="k">go</span>
<span class="w"> </span><span class="mi">90</span><span class="nx">ms</span><span class="w"> </span><span class="mi">230</span><span class="nx">ms</span><span class="w"> </span><span class="p">(</span><span class="nx">flat</span><span class="p">,</span><span class="w"> </span><span class="nx">cum</span><span class="p">)</span><span class="w"> </span><span class="mf">88.46</span><span class="o">%</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">Total</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">159</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">160</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">161</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">162</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">163</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mi">164</span><span class="p">:</span><span class="w"> </span><span class="nx">ok</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">tryScalar</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">165</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">166</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">167</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">168</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">169</span><span class="p">:</span><span class="w"> </span><span class="c1">// It was a scalar, we're done!</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">170</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">171</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">172</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">173</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">174</span><span class="p">:</span><span class="w"> </span><span class="c1">// Otherwise it's an array or object</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">175</span><span class="p">:</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">176</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">177</span><span class="p">:</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">178</span><span class="p">:</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">179</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mi">180</span><span class="p">:</span><span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">181</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">182</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">183</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">184</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">185</span><span class="p">:</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">186</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">187</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'"'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">188</span><span class="p">:</span><span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">189</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">190</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">191</span><span class="p">:</span><span class="w"> </span><span class="c1">// Two \\-es cancel eachother out</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">192</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">193</span><span class="p">:</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">194</span><span class="p">:</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">195</span><span class="p">:</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">196</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">197</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mi">198</span><span class="p">:</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">199</span><span class="p">:</span><span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">200</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">201</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">202</span><span class="p">:</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">203</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'['</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">204</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">205</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">']'</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">206</span><span class="p">:</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">207</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">208</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'['</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">209</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unexpected end of array: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">210</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">211</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'{'</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">212</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">213</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'}'</span><span class="p">:</span>
<span class="w"> </span><span class="mi">50</span><span class="nx">ms</span><span class="w"> </span><span class="mi">50</span><span class="nx">ms</span><span class="w"> </span><span class="mi">214</span><span class="p">:</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">215</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">216</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'{'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">217</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unexpected end of object: '%s'"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">218</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">219</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'"'</span><span class="p">:</span>
</pre></div>
<p>So by rank we can see we do spend the most time in <code>Peek</code> and
<code>Discard</code>. Then in pulling the last item out of the stack??? That's
weird. Let's ignore that.</p>
<h3 id="peek-and-discard">Peek and Discard</h3><p>Let's look at <code>Peek</code> in the pprof REPL:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nx">pprof</span><span class="p">)</span><span class="w"> </span><span class="nx">list</span><span class="w"> </span><span class="nx">Peek</span>
<span class="nx">Total</span><span class="p">:</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span>
<span class="nx">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Peek</span><span class="w"> </span><span class="nx">in</span><span class="w"> </span><span class="o">/</span><span class="nx">usr</span><span class="o">/</span><span class="nx">local</span><span class="o">/</span><span class="k">go</span><span class="o">/</span><span class="nx">src</span><span class="o">/</span><span class="nx">bufio</span><span class="o">/</span><span class="nx">bufio</span><span class="p">.</span><span class="k">go</span>
<span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mi">70</span><span class="nx">ms</span><span class="w"> </span><span class="p">(</span><span class="nx">flat</span><span class="p">,</span><span class="w"> </span><span class="nx">cum</span><span class="p">)</span><span class="w"> </span><span class="mf">26.92</span><span class="o">%</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">Total</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">130</span><span class="p">:</span><span class="c1">// also returns an error explaining why the read is short. The error is</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">131</span><span class="p">:</span><span class="c1">// ErrBufferFull if n is larger than b's buffer size.</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">132</span><span class="p">:</span><span class="c1">//</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">133</span><span class="p">:</span><span class="c1">// Calling Peek prevents a UnreadByte or UnreadRune call from succeeding</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">134</span><span class="p">:</span><span class="c1">// until the next read operation.</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">135</span><span class="p">:</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="nx">Peek</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">136</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">137</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrNegativeCount</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">138</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">139</span><span class="p">:</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">140</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">lastByte</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">141</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">lastRuneSize</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">142</span><span class="p">:</span>
<span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mi">143</span><span class="p">:</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="o">-</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="o">-</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">144</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">fill</span><span class="p">()</span><span class="w"> </span><span class="c1">// b.w-b.r < len(b.buf) => buffer is not full</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">145</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">146</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">147</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">148</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="p">:</span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="p">],</span><span class="w"> </span><span class="nx">ErrBufferFull</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">149</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">150</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">151</span><span class="p">:</span><span class="w"> </span><span class="c1">// 0 <= n <= len(b.buf)</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">152</span><span class="p">:</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">153</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">avail</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="p">;</span><span class="w"> </span><span class="nx">avail</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">154</span><span class="p">:</span><span class="w"> </span><span class="c1">// not enough data in buffer</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">155</span><span class="p">:</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">avail</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">156</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">readErr</span><span class="p">()</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">157</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">158</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ErrBufferFull</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">159</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">160</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">161</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="o">+</span><span class="nx">n</span><span class="p">],</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">162</span><span class="p">:}</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">163</span><span class="p">:</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">164</span><span class="p">:</span><span class="c1">// Discard skips the next n bytes, returning the number of bytes discarded.</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">165</span><span class="p">:</span><span class="c1">//</span>
<span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">166</span><span class="p">:</span><span class="c1">// If Discard skips fewer than n bytes, it also returns an error.</span>
</pre></div>
<p>The bulk of time here is spent in refilling the buffer (the <code>fill</code>
method). So it seems like while <code>bufio.Reader</code> buffers <em>reads</em> it
basically seems to not buffer <em>peeks</em>.</p>
<p>But hey, we were peeking and discarding one at a time anyway. Peeking
and discarding were the same cost in <code>eatValue</code>. So let's ignore
peeking for a second and think about discarding.</p>
<p>We could avoid doing so many discards if we just keep track of how
much we are peeking at in the loop and only discard once at the end of
the loop. (As an implementation detail, since there's a max internal
buffer size we'll need to actually periodically discard when we try to
peek and get a "buffer full" error.)</p>
<p>And based on that <code>top10</code> result above, we need to do this in
<code>eatValue</code>.</p>
<div class="highlight"><pre><span></span><span class="gu">@@ -170,16 +170,31 @@</span>
<span class="w"> </span> }
<span class="w"> </span> // Otherwise it's an array or object
<span class="gi">+ length := 0</span>
<span class="w"> </span> first := true
<span class="gd">-</span>
<span class="gi">+ var bs []byte</span>
<span class="w"> </span> for first || len(stack) > 0 {
<span class="gi">+ length++</span>
<span class="w"> </span> first = false
<span class="gd">- bs, err := r.Peek(1)</span>
<span class="gd">- if err != nil {</span>
<span class="gd">- return err</span>
<span class="gi">+ for {</span>
<span class="gi">+ bs, err = r.Peek(length)</span>
<span class="gi">+ if err == bufio.ErrBufferFull {</span>
<span class="gi">+ _, err = r.Discard(length - 1)</span>
<span class="gi">+ if err != nil {</span>
<span class="gi">+ return err</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ length = 1</span>
<span class="gi">+ continue</span>
<span class="gi">+ }</span>
<span class="gi">+ if err != nil {</span>
<span class="gi">+ return err</span>
<span class="gi">+ }</span>
<span class="gi">+</span>
<span class="gi">+ break</span>
<span class="w"> </span> }
<span class="gd">- b := bs[0]</span>
<span class="gi">+ b := bs[length-1]</span>
<span class="w"> </span> if inString {
<span class="w"> </span> if b == '"' && prev != '\\' {
<span class="gu">@@ -193,7 +208,6 @@</span>
<span class="w"> </span> prev = b
<span class="w"> </span> }
<span class="gd">- r.Discard(1)</span>
<span class="w"> </span> continue
<span class="w"> </span> }
<span class="gu">@@ -219,11 +233,11 @@</span>
<span class="w"> </span> // Closing quote case handled elsewhere, above
<span class="w"> </span> }
<span class="gd">- r.Discard(1)</span>
<span class="w"> </span> prev = b
<span class="w"> </span> }
<span class="gd">- return nil</span>
<span class="gi">+ _, err = r.Discard(length)</span>
<span class="gi">+ return err</span>
<span class="w"> </span>}
<span class="w"> </span>func (jr *jsonReader) tryScalar(r *bufio.Reader) (bool, any, error) {
</pre></div>
<p>Comment out the <code>pkg/profile</code> bits (profiling slows the whole thing down), rebuild, and rerun:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span>--warmup<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./control/control '.repo.url' > control.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./jqgo '.repo.url' > jqgo.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | jq '.repo.url' > jq.test"</span>
Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control/control<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>control.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">302</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">4</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">287</span>.7<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">49</span>.7<span class="w"> </span>ms<span class="o">]</span><span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">296</span>.6<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">308</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jqgo.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">215</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.6<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">189</span>.1<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">46</span>.9<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">213</span>.5<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">218</span>.7<span class="w"> </span>ms<span class="w"> </span><span class="m">13</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">3</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jq.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">355</span>.7<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.4<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">349</span>.9<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">26</span>.4<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">354</span>.3<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">359</span>.1<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Summary
<span class="w"> </span><span class="s1">'cat large-file.json | ./jqgo '</span>.repo.url<span class="s1">' > jqgo.test'</span><span class="w"> </span>ran
<span class="w"> </span><span class="m">1</span>.40<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | ./control/control '</span>.repo.url<span class="s1">' > control.test'</span>
<span class="w"> </span><span class="m">1</span>.65<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.01<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | jq '</span>.repo.url<span class="s1">' > jq.test'</span>
</pre></div>
<p>Great! We've shaved off another 40ms. Let's enable profiling,
re-run the program and go back into the pprof REPL.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>/dev/null
<span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:12:07<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>enabled,<span class="w"> </span>/tmp/profile2229743747/cpu.pprof
<span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:12:07<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>disabled,<span class="w"> </span>/tmp/profile2229743747/cpu.pprof
$<span class="w"> </span>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>/tmp/profile2229743747/cpu.pprof
File:<span class="w"> </span>jqgo
Type:<span class="w"> </span>cpu
Time:<span class="w"> </span>Jul<span class="w"> </span><span class="m">11</span>,<span class="w"> </span><span class="m">2022</span><span class="w"> </span>at<span class="w"> </span><span class="m">3</span>:12am<span class="w"> </span><span class="o">(</span>UTC<span class="o">)</span>
Duration:<span class="w"> </span><span class="m">401</span>.33ms,<span class="w"> </span>Total<span class="w"> </span><span class="nv">samples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>210ms<span class="w"> </span><span class="o">(</span><span class="m">52</span>.33%<span class="o">)</span>
Entering<span class="w"> </span>interactive<span class="w"> </span>mode<span class="w"> </span><span class="o">(</span><span class="nb">type</span><span class="w"> </span><span class="s2">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>commands,<span class="w"> </span><span class="s2">"o"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>options<span class="o">)</span>
<span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>top10
Showing<span class="w"> </span>nodes<span class="w"> </span>accounting<span class="w"> </span><span class="k">for</span><span class="w"> </span>210ms,<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>of<span class="w"> </span>210ms<span class="w"> </span>total
Showing<span class="w"> </span>top<span class="w"> </span><span class="m">10</span><span class="w"> </span>nodes<span class="w"> </span>out<span class="w"> </span>of<span class="w"> </span><span class="m">20</span>
<span class="w"> </span>flat<span class="w"> </span>flat%<span class="w"> </span>sum%<span class="w"> </span>cum<span class="w"> </span>cum%
<span class="w"> </span>100ms<span class="w"> </span><span class="m">47</span>.62%<span class="w"> </span><span class="m">47</span>.62%<span class="w"> </span>180ms<span class="w"> </span><span class="m">85</span>.71%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.eatValue
<span class="w"> </span>70ms<span class="w"> </span><span class="m">33</span>.33%<span class="w"> </span><span class="m">80</span>.95%<span class="w"> </span>70ms<span class="w"> </span><span class="m">33</span>.33%<span class="w"> </span>bufio.<span class="o">(</span>*Reader<span class="o">)</span>.Peek
<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">85</span>.71%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*encodeState<span class="o">)</span>.string
<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">90</span>.48%<span class="w"> </span>20ms<span class="w"> </span><span class="m">9</span>.52%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.expectString
<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">95</span>.24%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.readByte
<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>reflect.Value.Type
<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*Encoder<span class="o">)</span>.Encode
<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.literalStore
<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.unmarshal
<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.value
</pre></div>
<p>Nice, <code>syscall.Syscall</code> is no longer in the top 10. But <code>eatValue</code> is
and we're still spending a bunch of time in <code>Peek</code>. We didn't try to
stop calling <code>Peek</code> so much, we just cut down on calling <code>Discard</code>.</p>
<p>List <code>eatValue</code>.</p>
<div class="highlight"><pre><span></span><span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>list<span class="w"> </span>eatValue
Total:<span class="w"> </span>210ms
<span class="nv">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.eatValue<span class="w"> </span><span class="k">in</span><span class="w"> </span>/home/phil/tmp/jqgo/mainpeek.go
<span class="w"> </span>100ms<span class="w"> </span>180ms<span class="w"> </span><span class="o">(</span>flat,<span class="w"> </span>cum<span class="o">)</span><span class="w"> </span><span class="m">85</span>.71%<span class="w"> </span>of<span class="w"> </span>Total
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">159</span>:<span class="w"> </span>err<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>jr.eatWhitespace<span class="o">(</span>r<span class="o">)</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">160</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">161</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">162</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">163</span>:
<span class="w"> </span>.<span class="w"> </span>20ms<span class="w"> </span><span class="m">164</span>:<span class="w"> </span>ok,<span class="w"> </span>_,<span class="w"> </span>err<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>jr.tryScalar<span class="o">(</span>r<span class="o">)</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">165</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">166</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">167</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">168</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">169</span>:<span class="w"> </span>//<span class="w"> </span>It<span class="w"> </span>was<span class="w"> </span>a<span class="w"> </span>scalar,<span class="w"> </span>we<span class="s1">'re done!</span>
<span class="s1"> . . 170: if ok {</span>
<span class="s1"> . . 171: return nil</span>
<span class="s1"> . . 172: }</span>
<span class="s1"> . . 173:</span>
<span class="s1"> . . 174: // Otherwise it'</span>s<span class="w"> </span>an<span class="w"> </span>array<span class="w"> </span>or<span class="w"> </span>object
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">175</span>:<span class="w"> </span>length<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">176</span>:<span class="w"> </span>first<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="nb">true</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">177</span>:<span class="w"> </span>var<span class="w"> </span>bs<span class="w"> </span><span class="o">[]</span>byte
<span class="w"> </span>20ms<span class="w"> </span>20ms<span class="w"> </span><span class="m">178</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>first<span class="w"> </span><span class="o">||</span><span class="w"> </span>len<span class="o">(</span>stack<span class="o">)</span><span class="w"> </span>><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">179</span>:<span class="w"> </span>length++
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">180</span>:<span class="w"> </span><span class="nv">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">181</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">182</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>20ms<span class="w"> </span>80ms<span class="w"> </span><span class="m">183</span>:<span class="w"> </span>bs,<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>r.Peek<span class="o">(</span>length<span class="o">)</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">184</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>bufio.ErrBufferFull<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">185</span>:<span class="w"> </span>_,<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>r.Discard<span class="o">(</span>length<span class="w"> </span>-<span class="w"> </span><span class="m">1</span><span class="o">)</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">186</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">187</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">188</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">189</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">190</span>:<span class="w"> </span><span class="nv">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">191</span>:<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">192</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">193</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">194</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">195</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">196</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">197</span>:<span class="w"> </span><span class="k">break</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">198</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">199</span>:<span class="w"> </span>b<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>bs<span class="o">[</span>length-1<span class="o">]</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">200</span>:
<span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">201</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>inString<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">202</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span>prev<span class="w"> </span>!<span class="o">=</span><span class="w"> </span><span class="s1">'\\'</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="m">203</span>:<span class="w"> </span><span class="nv">inString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">204</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">205</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">206</span>:<span class="w"> </span>//<span class="w"> </span>Two<span class="w"> </span><span class="se">\\</span>-es<span class="w"> </span>cancel<span class="w"> </span>eachother<span class="w"> </span>out
<span class="w"> </span>20ms<span class="w"> </span>20ms<span class="w"> </span><span class="m">207</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'\\'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nv">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'\\'</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">208</span>:<span class="w"> </span><span class="nv">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>byte<span class="o">(</span><span class="m">0</span><span class="o">)</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">209</span>:<span class="w"> </span><span class="o">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">210</span>:<span class="w"> </span><span class="nv">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>b
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">211</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.
</pre></div>
<p>The bulk of time is spent in <code>Peek</code>. Let's list <code>Peek</code> again.</p>
<div class="highlight"><pre><span></span><span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>list<span class="w"> </span>Peek
Total:<span class="w"> </span>210ms
<span class="nv">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span>bufio.<span class="o">(</span>*Reader<span class="o">)</span>.Peek<span class="w"> </span><span class="k">in</span><span class="w"> </span>/usr/local/go/src/bufio/bufio.go
<span class="w"> </span>70ms<span class="w"> </span>70ms<span class="w"> </span><span class="o">(</span>flat,<span class="w"> </span>cum<span class="o">)</span><span class="w"> </span><span class="m">33</span>.33%<span class="w"> </span>of<span class="w"> </span>Total
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">130</span>://<span class="w"> </span>also<span class="w"> </span>returns<span class="w"> </span>an<span class="w"> </span>error<span class="w"> </span>explaining<span class="w"> </span>why<span class="w"> </span>the<span class="w"> </span><span class="nb">read</span><span class="w"> </span>is<span class="w"> </span>short.<span class="w"> </span>The<span class="w"> </span>error<span class="w"> </span>is
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">131</span>://<span class="w"> </span>ErrBufferFull<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span>is<span class="w"> </span>larger<span class="w"> </span>than<span class="w"> </span>b<span class="err">'</span>s<span class="w"> </span>buffer<span class="w"> </span>size.
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">132</span>://
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">133</span>://<span class="w"> </span>Calling<span class="w"> </span>Peek<span class="w"> </span>prevents<span class="w"> </span>a<span class="w"> </span>UnreadByte<span class="w"> </span>or<span class="w"> </span>UnreadRune<span class="w"> </span>call<span class="w"> </span>from<span class="w"> </span>succeeding
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">134</span>://<span class="w"> </span><span class="k">until</span><span class="w"> </span>the<span class="w"> </span>next<span class="w"> </span><span class="nb">read</span><span class="w"> </span>operation.
<span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">135</span>:func<span class="w"> </span><span class="o">(</span>b<span class="w"> </span>*Reader<span class="o">)</span><span class="w"> </span>Peek<span class="o">(</span>n<span class="w"> </span>int<span class="o">)</span><span class="w"> </span><span class="o">([]</span>byte,<span class="w"> </span>error<span class="o">)</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">136</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span><<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">137</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>nil,<span class="w"> </span>ErrNegativeCount
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">138</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">139</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">140</span>:<span class="w"> </span>b.lastByte<span class="w"> </span><span class="o">=</span><span class="w"> </span>-1
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">141</span>:<span class="w"> </span>b.lastRuneSize<span class="w"> </span><span class="o">=</span><span class="w"> </span>-1
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">142</span>:
<span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">143</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>b.w-b.r<span class="w"> </span><<span class="w"> </span>n<span class="w"> </span><span class="o">&&</span><span class="w"> </span>b.w-b.r<span class="w"> </span><<span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span>b.err<span class="w"> </span><span class="o">==</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">144</span>:<span class="w"> </span>b.fill<span class="o">()</span><span class="w"> </span>//<span class="w"> </span>b.w-b.r<span class="w"> </span><<span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span><span class="w"> </span><span class="o">=</span>><span class="w"> </span>buffer<span class="w"> </span>is<span class="w"> </span>not<span class="w"> </span>full
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">145</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">146</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">147</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span>><span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">148</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>b.buf<span class="o">[</span>b.r:b.w<span class="o">]</span>,<span class="w"> </span>ErrBufferFull
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">149</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">150</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">151</span>:<span class="w"> </span>//<span class="w"> </span><span class="m">0</span><span class="w"> </span><<span class="o">=</span><span class="w"> </span>n<span class="w"> </span><<span class="o">=</span><span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">152</span>:<span class="w"> </span>var<span class="w"> </span>err<span class="w"> </span>error
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">153</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>avail<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>b.w<span class="w"> </span>-<span class="w"> </span>b.r<span class="p">;</span><span class="w"> </span>avail<span class="w"> </span><<span class="w"> </span>n<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">154</span>:<span class="w"> </span>//<span class="w"> </span>not<span class="w"> </span>enough<span class="w"> </span>data<span class="w"> </span><span class="k">in</span><span class="w"> </span>buffer
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">155</span>:<span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>avail
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">156</span>:<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>b.readErr<span class="o">()</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">157</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">158</span>:<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ErrBufferFull
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">159</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">160</span>:<span class="w"> </span><span class="o">}</span>
<span class="w"> </span>50ms<span class="w"> </span>50ms<span class="w"> </span><span class="m">161</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>b.buf<span class="o">[</span>b.r<span class="w"> </span>:<span class="w"> </span>b.r+n<span class="o">]</span>,<span class="w"> </span>err
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">162</span>:<span class="o">}</span>
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">163</span>:
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">164</span>://<span class="w"> </span>Discard<span class="w"> </span>skips<span class="w"> </span>the<span class="w"> </span>next<span class="w"> </span>n<span class="w"> </span>bytes,<span class="w"> </span>returning<span class="w"> </span>the<span class="w"> </span>number<span class="w"> </span>of<span class="w"> </span>bytes<span class="w"> </span>discarded.
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">165</span>://
<span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">166</span>://<span class="w"> </span>If<span class="w"> </span>Discard<span class="w"> </span>skips<span class="w"> </span>fewer<span class="w"> </span>than<span class="w"> </span>n<span class="w"> </span>bytes,<span class="w"> </span>it<span class="w"> </span>also<span class="w"> </span>returns<span class="w"> </span>an<span class="w"> </span>error.
</pre></div>
<p>Well it's not really clear to me from this why we spend so much time
slicing here.</p>
<p>We might be able to use <code>Peek</code> much less if we kept our own FIFO queue
of peeked-at bytes. But I don't feel like writing a correct, efficient
FIFO queue (a ring buffer, basically) and maybe there are other
aspects of this program we can look at. So let's give this train of
thought a break.</p>
<h3 id="memory-profiling">Memory profiling</h3><p>Let's change tactics entirely. Memory allocation tends to be
expensive. Allocating in a loop is generally a bad idea. And this
entire program is a loop. So let's try doing a memory profile instead
of a CPU profile.</p>
<p>Instead of <code>defer profile.Start().Stop()</code> we'll set <code>defer
profile.Start(profile.MemProfile).Stop()</code>.</p>
<p>Build, rerun and enter pprof with the <code>-alloc_space</code> flag. We want to
see where memory is being allocated.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>/dev/null
<span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:24:55<span class="w"> </span>profile:<span class="w"> </span>memory<span class="w"> </span>profiling<span class="w"> </span>enabled<span class="w"> </span><span class="o">(</span>rate<span class="w"> </span><span class="m">4096</span><span class="o">)</span>,<span class="w"> </span>/tmp/profile1407859643/mem.pprof
<span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:24:56<span class="w"> </span>profile:<span class="w"> </span>memory<span class="w"> </span>profiling<span class="w"> </span>disabled,<span class="w"> </span>/tmp/profile1407859643/mem.pprof
$<span class="w"> </span>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>-alloc_objects<span class="w"> </span>/tmp/profile1407859643/mem.pprof
File:<span class="w"> </span>jqgo
Type:<span class="w"> </span>alloc_objects
Time:<span class="w"> </span>Jul<span class="w"> </span><span class="m">11</span>,<span class="w"> </span><span class="m">2022</span><span class="w"> </span>at<span class="w"> </span><span class="m">3</span>:24am<span class="w"> </span><span class="o">(</span>UTC<span class="o">)</span>
Entering<span class="w"> </span>interactive<span class="w"> </span>mode<span class="w"> </span><span class="o">(</span><span class="nb">type</span><span class="w"> </span><span class="s2">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>commands,<span class="w"> </span><span class="s2">"o"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>options<span class="o">)</span>
<span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>top10
Showing<span class="w"> </span>nodes<span class="w"> </span>accounting<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="m">365899</span>,<span class="w"> </span><span class="m">99</span>.95%<span class="w"> </span>of<span class="w"> </span><span class="m">366086</span><span class="w"> </span>total
Dropped<span class="w"> </span><span class="m">24</span><span class="w"> </span>nodes<span class="w"> </span><span class="o">(</span>cum<span class="w"> </span><<span class="o">=</span><span class="w"> </span><span class="m">1830</span><span class="o">)</span>
Showing<span class="w"> </span>top<span class="w"> </span><span class="m">10</span><span class="w"> </span>nodes<span class="w"> </span>out<span class="w"> </span>of<span class="w"> </span><span class="m">14</span>
<span class="w"> </span>flat<span class="w"> </span>flat%<span class="w"> </span>sum%<span class="w"> </span>cum<span class="w"> </span>cum%
<span class="w"> </span><span class="m">227585</span><span class="w"> </span><span class="m">62</span>.17%<span class="w"> </span><span class="m">62</span>.17%<span class="w"> </span><span class="m">262708</span><span class="w"> </span><span class="m">71</span>.76%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.expectString
<span class="w"> </span><span class="m">40945</span><span class="w"> </span><span class="m">11</span>.18%<span class="w"> </span><span class="m">73</span>.35%<span class="w"> </span><span class="m">40945</span><span class="w"> </span><span class="m">11</span>.18%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.readByte
<span class="w"> </span><span class="m">39500</span><span class="w"> </span><span class="m">10</span>.79%<span class="w"> </span><span class="m">84</span>.14%<span class="w"> </span><span class="m">252585</span><span class="w"> </span><span class="m">69</span>.00%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.tryScalar
<span class="w"> </span><span class="m">30009</span><span class="w"> </span><span class="m">8</span>.20%<span class="w"> </span><span class="m">92</span>.34%<span class="w"> </span><span class="m">41924</span><span class="w"> </span><span class="m">11</span>.45%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.tryNumber
<span class="w"> </span><span class="m">12055</span><span class="w"> </span><span class="m">3</span>.29%<span class="w"> </span><span class="m">95</span>.63%<span class="w"> </span><span class="m">215416</span><span class="w"> </span><span class="m">58</span>.84%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.eatValue
<span class="w"> </span><span class="m">7555</span><span class="w"> </span><span class="m">2</span>.06%<span class="w"> </span><span class="m">97</span>.70%<span class="w"> </span><span class="m">11915</span><span class="w"> </span><span class="m">3</span>.25%<span class="w"> </span>encoding/json.Unmarshal
<span class="w"> </span><span class="m">4360</span><span class="w"> </span><span class="m">1</span>.19%<span class="w"> </span><span class="m">98</span>.89%<span class="w"> </span><span class="m">4360</span><span class="w"> </span><span class="m">1</span>.19%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.literalStore
<span class="w"> </span><span class="m">3847</span><span class="w"> </span><span class="m">1</span>.05%<span class="w"> </span><span class="m">99</span>.94%<span class="w"> </span><span class="m">3847</span><span class="w"> </span><span class="m">1</span>.05%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.expectIdentifier
<span class="w"> </span><span class="m">43</span><span class="w"> </span><span class="m">0</span>.012%<span class="w"> </span><span class="m">99</span>.95%<span class="w"> </span><span class="m">365931</span><span class="w"> </span><span class="m">100</span>%<span class="w"> </span>runtime.main
<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">99</span>.95%<span class="w"> </span><span class="m">4360</span><span class="w"> </span><span class="m">1</span>.19%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.unmarshal
</pre></div>
<p>And just like in the CPU profile we can list functions to see where
the allocations happen in code. Let's list the biggest memory user
here, <code>expectString</code>.</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="n">pprof</span><span class="p">)</span><span class="w"> </span><span class="n">list</span><span class="w"> </span><span class="n">expectString</span>
<span class="n">Total</span><span class="p">:</span><span class="w"> </span><span class="mi">366086</span>
<span class="n">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span><span class="n">main</span><span class="o">.</span><span class="p">(</span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="o">.</span><span class="n">expectString</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">phil</span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">jqgo</span><span class="o">/</span><span class="n">mainpeek</span><span class="o">.</span><span class="n">go</span>
<span class="w"> </span><span class="mi">227585</span><span class="w"> </span><span class="mi">262708</span><span class="w"> </span><span class="p">(</span><span class="n">flat</span><span class="p">,</span><span class="w"> </span><span class="n">cum</span><span class="p">)</span><span class="w"> </span><span class="mf">71.76</span><span class="o">%</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">Total</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">58</span><span class="p">:</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">59</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">60</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">61</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">62</span><span class="p">:</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">4941</span><span class="w"> </span><span class="mi">63</span><span class="p">:</span><span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">64</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">65</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">66</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">67</span><span class="p">:</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">68</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">69</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">"Expected double quote to start string, got: '</span><span class="si">%s</span><span class="s2">'"</span><span class="p">,</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">b</span><span class="p">))</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">70</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">71</span><span class="p">:</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">72</span><span class="p">:</span><span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="n">byte</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">73</span><span class="p">:</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">30182</span><span class="w"> </span><span class="mi">74</span><span class="p">:</span><span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">75</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">76</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">err</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">77</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">78</span><span class="p">:</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">79</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">80</span><span class="p">:</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Just</span><span class="w"> </span><span class="n">skip</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">81</span><span class="p">:</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">82</span><span class="p">:</span><span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">83</span><span class="p">:</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">84</span><span class="p">:</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Overwrite</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">escaped</span><span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">quote</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">85</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'</span><span class="se">\\</span><span class="s1">'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">86</span><span class="p">:</span><span class="w"> </span><span class="n">s</span><span class="p">[</span><span class="n">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'"'</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">87</span><span class="p">:</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">88</span><span class="p">:</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Otherwise</span><span class="w"> </span><span class="n">it</span><span class="s1">'s the actual end</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">89</span><span class="p">:</span><span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">90</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">91</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">92</span><span class="p">:</span>
<span class="w"> </span><span class="mi">146302</span><span class="w"> </span><span class="mi">146302</span><span class="w"> </span><span class="mi">93</span><span class="p">:</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">)</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">94</span><span class="p">:</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">95</span><span class="p">:</span><span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">96</span><span class="p">:</span>
<span class="w"> </span><span class="mi">81283</span><span class="w"> </span><span class="mi">81283</span><span class="w"> </span><span class="mi">97</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">s</span><span class="p">),</span><span class="w"> </span><span class="n">nil</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">98</span><span class="p">:}</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">99</span><span class="p">:</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">100</span><span class="p">:</span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">expectIdentifier</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">,</span><span class="w"> </span><span class="n">ident</span><span class="w"> </span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">any</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">101</span><span class="p">:</span><span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span>
<span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">102</span><span class="p">:</span>
</pre></div>
<p>And the biggest offender is growing the string! The good thing is that
growing this string can be amortized because we can share the
underlying string memory across calls on the <code>jsonResponse</code>
struct. This way, <code>expectString</code> only needs to grow the string when it
actually sees a bigger string than we've already seen.</p>
<p>The builtin <a href="https://pkg.go.dev/bytes#Buffer">bytes.Buffer</a> type does
exactly this. We can put a <code>bytes.Buffer</code> on the <code>jsonResponse</code> struct
because this code isn't multithreaded and because <code>expectString</code>
doesn't call itself.</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">2</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="mi">7</span><span class="w"> </span><span class="err">@@</span>
<span class="w"> </span><span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bufio"</span>
<span class="o">+</span><span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">13</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">14</span><span class="p">,</span><span class="mi">8</span><span class="w"> </span><span class="err">@@</span>
<span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">jsonReader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">read</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">expectString_buffer</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">reset</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">51</span><span class="p">,</span><span class="mi">7</span><span class="w"> </span><span class="o">+</span><span class="mi">54</span><span class="p">,</span><span class="mi">7</span><span class="w"> </span><span class="err">@@</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">expectString</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="o">-</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="o">+</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">Reset</span><span class="p">()</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">81</span><span class="p">,</span><span class="mi">18</span><span class="w"> </span><span class="o">+</span><span class="mi">84</span><span class="p">,</span><span class="mi">18</span><span class="w"> </span><span class="err">@@</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Overwrite the escaped double quote</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\\'</span><span class="w"> </span><span class="p">{</span>
<span class="o">-</span><span class="w"> </span><span class="nx">s</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sc">'"'</span>
<span class="o">+</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()[</span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">Len</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sc">'"'</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Otherwise it's the actual end</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">-</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">-</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">String</span><span class="p">(),</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p class="note">
Or instead of sharing memory on the struct, maybe this would be a
good place to use <a href="https://pkg.go.dev/sync#Pool">sync.Pool</a>?
</p><p>Disable <code>pkg/profile</code>, build and rerun with hyperfine.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span>--warmup<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./control/control '.repo.url' > control.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | ./jqgo '.repo.url' > jqgo.test"</span><span class="w"> </span><span class="se">\</span>
<span class="w"> </span><span class="s2">"cat large-file.json | jq '.repo.url' > jq.test"</span>
Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control/control<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>control.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">307</span>.2<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">10</span>.8<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">292</span>.8<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">49</span>.4<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">296</span>.5<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">326</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jqgo.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">210</span>.8<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">185</span>.4<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">44</span>.9<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">209</span>.1<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">216</span>.8<span class="w"> </span>ms<span class="w"> </span><span class="m">14</span><span class="w"> </span>runs
Benchmark<span class="w"> </span><span class="m">3</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.repo.url'</span><span class="w"> </span>><span class="w"> </span>jq.test
<span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">356</span>.1<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.6<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">349</span>.1<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">26</span>.9<span class="w"> </span>ms<span class="o">]</span>
<span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">354</span>.1<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">362</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs
Summary
<span class="w"> </span><span class="s1">'cat large-file.json | ./jqgo '</span>.repo.url<span class="s1">' > jqgo.test'</span><span class="w"> </span>ran
<span class="w"> </span><span class="m">1</span>.46<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.05<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | ./control/control '</span>.repo.url<span class="s1">' > control.test'</span>
<span class="w"> </span><span class="m">1</span>.69<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">'cat large-file.json | jq '</span>.repo.url<span class="s1">' > jq.test'</span>
</pre></div>
<p>And we've shaved another 20ms off. That's not bad!</p>
<h3 id="coming-to-a-close">Coming to a close</h3><p>There is more we could do but this is a long post already.</p>
<p>For example, in the project repo I also built a <a href="https://github.com/eatonphil/jqgo/blob/main/vector.go">generic vector
type</a> with a
pop operation that is used for the stack in the <code>eatValue</code>
function. It is shared on the <code>jsonReader</code> instance like the
<code>expectString</code> buffer. This ended up shaving another 20ms. And I also
got rid of most conversions from <code>[]byte</code> to <code>string</code> (which is an
expensive allocation you may notice listed as <code>bytes.String()</code> in the
<code>top10</code> of <code>-alloc_objects</code> if you run the profiler again now.)</p>
<p>But hopefully you're getting the gist of how you might investigate CPU
and memory usage. For me it's still a lot of poking around and trying
different things. But after a few years of trying to get better at
profiling Go programs I think I'm starting to get the hang of it.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post on implementing a simple jq clone from scratch in Go. This post explores partial/fuzzy parsing again and finishes with my approach to debugging memory/CPU usage in Go programs. It's a bit of a long post but hopefully worthwhile! :)<a href="https://t.co/DxilIVaUBa">https://t.co/DxilIVaUBa</a> <a href="https://t.co/as3Sr5I2G0">pic.twitter.com/as3Sr5I2G0</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1546470283334270977?ref_src=twsrc%5Etfw">July 11, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/implementing-a-jq-clone-in-go.htmlSun, 10 Jul 2022 00:00:00 +0000
- One year as a solo dev building open-source data tools without fundinghttp://notes.eatonphil.com/2022-06-11-year-in-review.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-06-11-year-in-review.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-06-11-year-in-review.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/2022-06-11-year-in-review.htmlFri, 10 Jun 2022 00:00:00 +0000
- Let's build a distributed Postgres proof of concepthttp://notes.eatonphil.com/distributed-postgres.html<p>What is CockroachDB under the hood? Take a look at
<a href="https://github.com/cockroachdb/cockroach/blob/master/go.mod">its go.mod</a>
and notice a number of dependencies that do a lot of work: <a href="https://github.com/jackc/pgproto3">a
PostgreSQL wire protocol
implementation</a>, <a href="https://github.com/cockroachdb/pebble">a storage
layer</a>, <a href="https://github.com/etcd-io/etcd">a Raft implementation
for distributed consensus</a>. And not
part of go.mod but still building on 3rd party code, <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/parser/sql.y">PostgreSQL's
grammar
definition</a>.</p>
<p>To be <em>absurdly</em> reductionist, CockroachDB is just the glue around these
libraries. With that reductionist mindset, let's try building a
distributed Postgres proof of concept ourselves! We'll use only four
major external libraries: for parsing SQL, handling Postgres's wire
protocol, handling Raft, and handling the storage of table metadata
and rows themselves.</p>
<p class="note">
For a not-reductionist understanding of the CockroachDB internals, I
recommend following the
excellent <a href="https://www.cockroachlabs.com/blog/">Cockroach
Engineering blog</a>
and <a href="https://www.twitch.tv/large__data__bank">Jordan Lewis's
Hacking CockroachDB Twitch stream</a>.
</p><p>By the end of this post, in around 600 lines of code, we'll have a
distributed "Postgres implementation" that will accept writes
(<code>CREATE TABLE</code>, <code>INSERT</code>) on the leader and accept reads (<code>SELECT</code>)
on any node. All nodes will contain the same data.</p>
<p>Here is a sample interaction against the leader:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>-p<span class="w"> </span><span class="m">6000</span>
psql<span class="w"> </span><span class="o">(</span><span class="m">13</span>.4,<span class="w"> </span>server<span class="w"> </span><span class="m">0</span>.0.0<span class="o">)</span>
Type<span class="w"> </span><span class="s2">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>help.
<span class="nv">phil</span><span class="o">=</span>><span class="w"> </span>create<span class="w"> </span>table<span class="w"> </span>x<span class="w"> </span><span class="o">(</span>age<span class="w"> </span>int,<span class="w"> </span>name<span class="w"> </span>text<span class="o">)</span><span class="p">;</span>
CREATE<span class="w"> </span>ok
<span class="nv">phil</span><span class="o">=</span>><span class="w"> </span>insert<span class="w"> </span>into<span class="w"> </span>x<span class="w"> </span>values<span class="o">(</span><span class="m">14</span>,<span class="w"> </span><span class="s1">'garry'</span><span class="o">)</span>,<span class="w"> </span><span class="o">(</span><span class="m">20</span>,<span class="w"> </span><span class="s1">'ted'</span><span class="o">)</span><span class="p">;</span>
could<span class="w"> </span>not<span class="w"> </span>interpret<span class="w"> </span>result<span class="w"> </span>from<span class="w"> </span>server:<span class="w"> </span>INSERT<span class="w"> </span>ok
INSERT<span class="w"> </span>ok
<span class="nv">phil</span><span class="o">=</span>><span class="w"> </span><span class="k">select</span><span class="w"> </span>name,<span class="w"> </span>age<span class="w"> </span>from<span class="w"> </span>x<span class="p">;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age<span class="w"> </span>
---------+-----
<span class="w"> </span><span class="s2">"garry"</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">14</span>
<span class="w"> </span><span class="s2">"ted"</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">20</span>
<span class="o">(</span><span class="m">2</span><span class="w"> </span>rows<span class="o">)</span>
</pre></div>
<p>And against a follower (note the different port):</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>psql<span class="w"> </span>-h<span class="w"> </span><span class="m">127</span>.0.0.1<span class="w"> </span>-p<span class="w"> </span><span class="m">6001</span>
psql<span class="w"> </span><span class="o">(</span><span class="m">13</span>.4,<span class="w"> </span>server<span class="w"> </span><span class="m">0</span>.0.0<span class="o">)</span>
Type<span class="w"> </span><span class="s2">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>help.
<span class="nv">phil</span><span class="o">=</span>><span class="w"> </span><span class="k">select</span><span class="w"> </span>age,<span class="w"> </span>name<span class="w"> </span>from<span class="w"> </span>x<span class="p">;</span>
<span class="w"> </span>age<span class="w"> </span><span class="p">|</span><span class="w"> </span>name
-----+---------
<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="s2">"ted"</span>
<span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="s2">"garry"</span>
<span class="o">(</span><span class="m">2</span><span class="w"> </span>rows<span class="o">)</span>
</pre></div>
<p>All code for this post is <a href="https://github.com/eatonphil/waterbugdb">available on Github in the fondly named
WaterbugDB repo</a>.</p>
<h3 id="plan-of-attack">Plan of attack</h3><p>Influenced by <a href="https://youtu.be/rqO9PtBkiSQ?t=2332">Philip O'Toole's talk on rqlite at Hacker
Nights</a> we'll
have a Postgres wire protocol server in front. As it receives queries
it will respond immediately to <code>SELECT</code>s. Otherwise for <code>CREATE TABLE</code>s
and <code>INSERT</code>s it will send the entire query string to the Raft
cluster. Each process that is part of the Raft cluster will implement
the appropriate functions for handling Raft messages. In this case the
messages will just be to create a table or insert data.</p>
<p>So every running process will run a Postgres wire protocol server, a
Raft server, and an HTTP server that you'll see is an implementation
detail about how processes join to the same Raft cluster.</p>
<p>Every running process will have its own directory for storing data.</p>
<h3 id="raft">Raft</h3><p>There is likely a difference between Raft, the paper, and Raft, the
implementations. When I refer to Raft in the rest of this post I'm
going to be referring to an implementation.</p>
<p>And although CockroachDB use's <a href="https://github.com/etcd-io/etcd">etcd's Raft
implementation</a>, I didn't realize
that when I started building this project. I used <a href="https://pkg.go.dev/github.com/hashicorp/raft">Hashicorp's Raft
implementation</a>.</p>
<p>Raft allows us to reliably keep multiple nodes in sync with a log of
messages. Each node in the Raft cluster implements a finite state
machine (FSM) with three operations: apply, snapshot, and restore. Our
finite state machine will embed a postgres engine we'll build out
after this to handle query execution.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"net"</span>
<span class="w"> </span><span class="s">"net/http"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"path"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="w"> </span><span class="s">"time"</span>
<span class="w"> </span><span class="s">"github.com/google/uuid"</span>
<span class="w"> </span><span class="s">"github.com/hashicorp/raft"</span>
<span class="w"> </span><span class="s">"github.com/hashicorp/raft-boltdb"</span>
<span class="w"> </span><span class="s">"github.com/jackc/pgproto3/v2"</span>
<span class="w"> </span><span class="nx">pgquery</span><span class="w"> </span><span class="s">"github.com/pganalyze/pg_query_go/v2"</span>
<span class="w"> </span><span class="nx">bolt</span><span class="w"> </span><span class="s">"go.etcd.io/bbolt"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">pgFsm</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span>
<span class="p">}</span>
</pre></div>
<p>From what I understand, the snapshot operation allows Raft to truncate
logs. It is used in conjuction with restoring. On startup if there is
a snapshot, restore is called so you can load the snapshot. Then
afterwards all logs not yet snapshotted are replayed through the apply
operation.</p>
<p>To keep this implementation simple we'll just fail all snapshots so
restore will never be called and all logs will be replayed every time
on startup through the apply operation. This is of course inefficient
but it keeps the code simpler.</p>
<p>When we write the startup code we'll need to delete the database so
that these apply calls happen fresh.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="w"> </span><span class="kd">struct</span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Persist</span><span class="p">(</span><span class="nx">sink</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">SnapshotSink</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">sink</span><span class="p">.</span><span class="nx">Cancel</span><span class="p">()</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Release</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Snapshot</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">FSMSnapshot</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Restore</span><span class="p">(</span><span class="nx">rc</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ReadCloser</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Nothing to restore"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Finally, applying is receiving a single message and applying it for the
node. In this project the message will be a <code>CREATE TABLE</code> or <code>INSERT</code>
query. So we'll parse the query and pass it to the postgres engine for
execution.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">log</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Log</span><span class="p">)</span><span class="w"> </span><span class="kd">interface</span><span class="p">{}</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">LogCommand</span><span class="p">:</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Data</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not parse payload: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pf</span><span class="p">.</span><span class="nx">pe</span><span class="p">.</span><span class="nx">execute</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown raft log type: %#v"</span><span class="p">,</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Panic-ing here is actually the <a href="https://github.com/hashicorp/raft/issues/307">advised
behavior</a>.</p>
<h4 id="raft-server">Raft server</h4><p>Now we can set up the actual Raft server and pass an instance of this
FSM. This is a bunch of boilerplate that would matter in production
installs but for us basically we just need to tell Raft where to run
and how to store its own internal data, including its all-important
message log.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">nodeId</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span>
<span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raftboltdb</span><span class="p">.</span><span class="nx">NewBoltStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">"bolt"</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create bolt store: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewFileSnapshotStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">"snapshot"</span><span class="p">),</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create snapshot store: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">ResolveTCPAddr</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not resolve address: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">transport</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewTCPTransport</span><span class="p">(</span><span class="nx">raftAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="o">*</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create tcp transport: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">raftCfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">DefaultConfig</span><span class="p">()</span>
<span class="w"> </span><span class="nx">raftCfg</span><span class="p">.</span><span class="nx">LocalID</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">)</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewRaft</span><span class="p">(</span><span class="nx">raftCfg</span><span class="p">,</span><span class="w"> </span><span class="nx">pf</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">transport</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not create raft instance: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Cluster consists of unjoined leaders. Picking a leader and</span>
<span class="w"> </span><span class="c1">// creating a real cluster is done manually after startup.</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">BootstrapCluster</span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Configuration</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Servers</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span>
<span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ID</span><span class="p">:</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">),</span>
<span class="w"> </span><span class="nx">Address</span><span class="p">:</span><span class="w"> </span><span class="nx">transport</span><span class="p">.</span><span class="nx">LocalAddr</span><span class="p">(),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Every instance of this process will run this and will start off as a
leader in a new cluster. We'll expose an HTTP server that allows a
leader to talk to other leaders to tell them to stop leading and
follow it. This HTTP endpoint in the HTTP server is how we'll get from
N process with N leaders and N clusters to N processes with 1 leader
and 1 cluster.</p>
<p>That's basically it for the core Raft bits. So let's build out that
HTTP server and follow endpoint.</p>
<h3 id="http-follow-endpoint">HTTP follow endpoint</h3><p>Our HTTP server will have just one endpoint that tells the process (a)
to contact another process (b) so that process (b) joins the process
(a) cluster.</p>
<p>The HTTP server will need to have the process (a)'s Raft instance
to be able to start this join action. And in order for Raft to know
how to contact the process (b) we'll need to tell it both the
process (b)'s unique Raft node id (we'll give it a unique id ourselves
when we start the process) and the process (b)'s Raft server port.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">httpServer</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">addFollowerHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">followerId</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">followerAddr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"addr"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">State</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">Leader</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">).</span><span class="nx">Encode</span><span class="p">(</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Error</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`json:"error"`</span>
<span class="w"> </span><span class="p">}{</span>
<span class="w"> </span><span class="s">"Not the leader"</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">AddVoter</span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">followerId</span><span class="p">),</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerAddress</span><span class="p">(</span><span class="nx">followerAddr</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">).</span><span class="nx">Error</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Failed to add follower: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>That's it! Let's move on to the query engine.</p>
<h3 id="query-engine">Query engine</h3><p>The query engine is a wrapper around a storage layer. We'll bring in
<a href="https://github.com/etcd-io/bbolt">bbolt</a>.</p>
<p class="note">
I originally built this
with <a href="https://github.com/cockroachdb/pebble">Cockroach's pebble</a> but pebble has a
<a href="https://app.bountysource.com/issues/99017984-unable-to-build-xxhash-conflicts-with-other-package">transitive dependency on a C library that has function names that
conflict with function names in the C library that pg_query_go
wraps</a>.
</p><div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">pgEngine</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span>
<span class="w"> </span><span class="nx">bucketName</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newPgEngine</span><span class="p">(</span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">pgEngine</span><span class="p">{</span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">"data"</span><span class="p">)}</span>
<span class="p">}</span>
</pre></div>
<p class="note">
bbolt organizes data into buckets. Buckets might be a natural way to
store table rows (one bucket per table) but to keep the implementation
simple we'll put all table metadata and row data into a single `data`
bucket.
</p><p>The entrypoint we called in the Raft apply implementation above was
<code>execute</code>. It took a parsed list of statements. We'll iterate over the
statements, figuring out the kind of each statement, and call out to a
dedicated helper for each kind.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">execute</span><span class="p">(</span><span class="nx">tree</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">ParseResult</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tree</span><span class="p">.</span><span class="nx">GetStmts</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">GetStmt</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetCreateStmt</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeCreate</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetInsertStmt</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeInsert</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetSelectStmt</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeSelect</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown statement type: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p class="note">
The pg_query_go docs are not super helpful. I had to build a
<a href="https://github.com/eatonphil/waterbugdb/blob/main/astexplorer/main.go">separate
AST explorer program</a> to make it easier to understand this parser.
</p><p>Let's start with creating a table.</p>
<h3 id="create-table">Create table</h3><p>When a table is created, we'll need to store its metadata.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">tableDefinition</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">ColumnTypes</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="p">}</span>
</pre></div>
<p>First we pull that metadata out of the AST.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">executeCreate</span><span class="p">(</span><span class="nx">stmt</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">CreateStmt</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tbl</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tableDefinition</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">Name</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Relation</span><span class="p">.</span><span class="nx">Relname</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">TableElts</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">GetColumnDef</span><span class="p">()</span>
<span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="p">,</span><span class="w"> </span><span class="nx">cd</span><span class="p">.</span><span class="nx">Colname</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Names is namespaced. So `INT` is pg_catalog.int4. `BIGINT` is pg_catalog.int8.</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">cd</span><span class="p">.</span><span class="nx">TypeName</span><span class="p">.</span><span class="nx">Names</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"."</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetString_</span><span class="p">().</span><span class="nx">Str</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now we need to store this in the storage layer. The easiest/dumbest
way to do this is to serialize the metadata to JSON and store it with
key: <code>tables_${tableName}</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">tableBytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">tbl</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not marshal table: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Update</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bkt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">CreateBucketIfNotExists</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bkt</span><span class="p">.</span><span class="nx">Put</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">"tables_"</span><span class="o">+</span><span class="nx">tbl</span><span class="p">.</span><span class="nx">Name</span><span class="p">),</span><span class="w"> </span><span class="nx">tableBytes</span><span class="p">)</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not set key-value: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Next we'll build a helper to reverse that operation, pulling out table
metadata from the storage layer by the table name:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">getTableDefinition</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">tableDefinition</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">tbl</span><span class="w"> </span><span class="nx">tableDefinition</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">View</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bkt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bkt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Table does not exist"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">valBytes</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bkt</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">"tables_"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">name</span><span class="p">))</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">valBytes</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">tbl</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not unmarshal table: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">tbl</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
</pre></div>
<p>That's it for our basic <code>CREATE TABLE</code> support! Let's do <code>INSERT</code> next.</p>
<h3 id="insert-row">Insert row</h3><p>Our support for insert will only support literal/constant <code>VALUES</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">executeInsert</span><span class="p">(</span><span class="nx">stmt</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">InsertStmt</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tblName</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Relation</span><span class="p">.</span><span class="nx">Relname</span>
<span class="w"> </span><span class="nx">slct</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">GetSelectStmt</span><span class="p">().</span><span class="nx">GetSelectStmt</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">ValuesLists</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rowData</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">values</span><span class="p">.</span><span class="nx">GetList</span><span class="p">().</span><span class="nx">Items</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">GetAConst</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Val</span><span class="p">.</span><span class="nx">GetString_</span><span class="p">();</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">rowData</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">rowData</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Str</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Val</span><span class="p">.</span><span class="nx">GetInteger</span><span class="p">();</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">rowData</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">rowData</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">Ival</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown value type: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>It would be better to abstract this <code>VALUES</code> code into a helper so it
could be used by <code>SELECT</code>s too but out of laziness we'll just keep
this here.</p>
<p>Next we need to write the row to the storage layer. We'll serialize
the row data to JSON (inefficient because we know the row structure,
but JSON is easy). We'll store the row with a prefix including the
table name and we'll give its key a unique UUID. When we're iterating
over rows in the table we'll be able to do a prefix scan that will
recover just the rows in this table.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">rowBytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">rowData</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not marshal row: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">uuid</span><span class="p">.</span><span class="nx">New</span><span class="p">().</span><span class="nx">String</span><span class="p">()</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Update</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bkt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">CreateBucketIfNotExists</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bkt</span><span class="p">.</span><span class="nx">Put</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">"rows_"</span><span class="o">+</span><span class="nx">tblName</span><span class="o">+</span><span class="s">"_"</span><span class="o">+</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">rowBytes</span><span class="p">)</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not store row: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Finally we can move on to support <code>SELECT</code>!</p>
<h3 id="select-rows">Select rows</h3><p>Unlike <code>CREATE TABLE</code> and <code>INSERT</code>, <code>SELECT</code> will need to return rows,
column names, and because the Postgres wire protocol wants it, column
types.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">pgResult</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">fieldTypes</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="kt">any</span>
<span class="p">}</span>
</pre></div>
<p>First we pull out the table name and the fields selected, looking up
field types in the table metadata.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">executeSelect</span><span class="p">(</span><span class="nx">stmt</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">SelectStmt</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">pgResult</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tblName</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">FromClause</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">GetRangeVar</span><span class="p">().</span><span class="nx">Relname</span>
<span class="w"> </span><span class="nx">tbl</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">getTableDefinition</span><span class="p">(</span><span class="nx">tblName</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">pgResult</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">TargetList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fieldName</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">GetResTarget</span><span class="p">().</span><span class="nx">Val</span><span class="p">.</span><span class="nx">GetColumnRef</span><span class="p">().</span><span class="nx">Fields</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">GetString_</span><span class="p">().</span><span class="nx">Str</span>
<span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldNames</span><span class="p">,</span><span class="w"> </span><span class="nx">fieldName</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fieldType</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">cn</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cn</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">fieldName</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fieldType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">fieldType</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown field: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">fieldName</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">fieldType</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Finally, we do a prefix scan to grab all rows in the table from the
storage layer.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">"rows_"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">tblName</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"_"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">View</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">).</span><span class="nx">Cursor</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">k</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nx">prefix</span><span class="p">);</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">HasPrefix</span><span class="p">(</span><span class="nx">k</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">);</span><span class="w"> </span><span class="nx">k</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">row</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unable to unmarshal row: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">targetRow</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">target</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">target</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">targetRow</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">targetRow</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">targetRow</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>That's it for <code>SELECT</code>! The last function we'll implement is a
helper for deleting all data in the storage layer. This will be called
on startup before Raft logs are applied so the database always ends up
in a consistent state.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">pe</span><span class="w"> </span><span class="o">*</span><span class="n">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="n">delete</span><span class="p">()</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">pe</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">Update</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">tx</span><span class="w"> </span><span class="o">*</span><span class="n">bolt</span><span class="o">.</span><span class="n">Tx</span><span class="p">)</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">bkt</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">tx</span><span class="o">.</span><span class="n">Bucket</span><span class="p">(</span><span class="n">pe</span><span class="o">.</span><span class="n">bucketName</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">bkt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">tx</span><span class="o">.</span><span class="n">DeleteBucket</span><span class="p">(</span><span class="n">pe</span><span class="o">.</span><span class="n">bucketName</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span>
<span class="w"> </span><span class="p">})</span>
<span class="p">}</span>
</pre></div>
<p>And we're ready to move on to the final layer, the Postgres wire
protocol.</p>
<h3 id="postgres-wire-protocol-server">Postgres wire protocol server</h3><p><a href="https://github.com/jackc/pgproto3">jackc/pgproto3</a> is an
implementation of the Postgres wire protocol for Go. It allows us to
implement a server that can respond to requests by Postgres clients
like <code>psql</code>.</p>
<p>It works by wrapping a TCP connection. So we'll start by building a
function that does the TCP serving loop.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">runPgServer</span><span class="p">(</span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ln</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Listen</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span><span class="w"> </span><span class="s">"localhost:"</span><span class="o">+</span><span class="nx">port</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ln</span><span class="p">.</span><span class="nx">Accept</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">{</span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">}</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handle</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>The <code>pgConn</code> instance needs access to the database directly so it can
respond to <code>SELECT</code>s. And it needs the Raft instance for all other
queries.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">pgConn</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">conn</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Conn</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span>
<span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span>
<span class="p">}</span>
</pre></div>
<p>The <code>handle</code> function we called above will grab the current message
via the pgproto3 package and handle startup messages and regular
messages.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">handle</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">pgc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">NewBackend</span><span class="p">(</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">NewChunkReader</span><span class="p">(</span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">),</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">)</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handleStartupMessage</span><span class="p">(</span><span class="nx">pgc</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handleMessage</span><span class="p">(</span><span class="nx">pgc</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Startup messages include authorization and SSL checks. We'll allow
anything in the former and respond "no" to the latter.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">handleStartupMessage</span><span class="p">(</span><span class="nx">pgconn</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">startupMessage</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgconn</span><span class="p">.</span><span class="nx">ReceiveStartupMessage</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error receiving startup message: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">startupMessage</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">StartupMessage</span><span class="p">:</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">(</span><span class="o">&</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">AuthenticationOk</span><span class="p">{}).</span><span class="nx">Encode</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">(</span><span class="o">&</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">ReadyForQuery</span><span class="p">{</span><span class="nx">TxStatus</span><span class="p">:</span><span class="w"> </span><span class="sc">'I'</span><span class="p">}).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error sending ready for query: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">SSLRequest</span><span class="p">:</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">"N"</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error sending deny SSL request: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handleStartupMessage</span><span class="p">(</span><span class="nx">pgconn</span><span class="p">)</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unknown startup message: %#v"</span><span class="p">,</span><span class="w"> </span><span class="nx">startupMessage</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Within the main <code>handleMessage</code> logic we'll check the type of message.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">handleMessage</span><span class="p">(</span><span class="nx">pgc</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgc</span><span class="p">.</span><span class="nx">Receive</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error receiving message: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Query</span><span class="p">:</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Terminate</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Received message other than Query from client: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>If the message is a query we'll parse it and respond immediately to <code>SELECT</code>s.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Query</span><span class="p">:</span>
<span class="w"> </span><span class="nx">stmts</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">String</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error parsing query: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">stmts</span><span class="p">.</span><span class="nx">GetStmts</span><span class="p">())</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Only make one request at a time."</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmts</span><span class="p">.</span><span class="nx">GetStmts</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// Handle SELECTs here</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">GetStmt</span><span class="p">().</span><span class="nx">GetSelectStmt</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">pe</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newPgEngine</span><span class="p">(</span><span class="nx">pc</span><span class="p">.</span><span class="nx">db</span><span class="p">)</span>
<span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeSelect</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">writePgResult</span><span class="p">(</span><span class="nx">res</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>(We'll implement that <code>writePgResult</code> helper shortly below.) Otherwise
we'll add the query to the Raft log and return a basic response.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Otherwise it's DDL/DML, raftify</span>
<span class="w"> </span><span class="nx">future</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">Apply</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">String</span><span class="p">),</span><span class="w"> </span><span class="mi">500</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Error</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not apply: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Response</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not apply (internal): %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">done</span><span class="p">(</span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToUpper</span><span class="p">(</span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">String</span><span class="p">,</span><span class="w"> </span><span class="s">" "</span><span class="p">)[</span><span class="mi">0</span><span class="p">])</span><span class="o">+</span><span class="s">" ok"</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Terminate</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Received message other than Query from client: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p><code>done</code> is an important helper that tells the Postgres connection that
the query is complete and the server is ready to receive another
query. Without this response <code>psql</code> just hangs.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">done</span><span class="p">(</span><span class="nx">buf</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">(</span><span class="o">&</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">CommandComplete</span><span class="p">{</span><span class="nx">CommandTag</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">msg</span><span class="p">)}).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">(</span><span class="o">&</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">ReadyForQuery</span><span class="p">{</span><span class="nx">TxStatus</span><span class="p">:</span><span class="w"> </span><span class="sc">'I'</span><span class="p">}).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Failed to write query response: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And now let's implement the <code>writePgResult</code> helper. This function
needs to translate from our <code>pgResult</code> struct to the format require by
pgproto3.</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">dataTypeOIDMap</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">uint32</span><span class="p">{</span>
<span class="w"> </span><span class="s">"text"</span><span class="p">:</span><span class="w"> </span><span class="mi">25</span><span class="p">,</span>
<span class="w"> </span><span class="s">"pg_catalog.int4"</span><span class="p">:</span><span class="w"> </span><span class="mi">23</span><span class="p">,</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">writePgResult</span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="o">*</span><span class="nx">pgResult</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">rd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">RowDescription</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">res</span><span class="p">.</span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">rd</span><span class="p">.</span><span class="nx">Fields</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">rd</span><span class="p">.</span><span class="nx">Fields</span><span class="p">,</span><span class="w"> </span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">FieldDescription</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">field</span><span class="p">),</span>
<span class="w"> </span><span class="nx">DataTypeOID</span><span class="p">:</span><span class="w"> </span><span class="nx">dataTypeOIDMap</span><span class="p">[</span><span class="nx">res</span><span class="p">.</span><span class="nx">fieldTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">]],</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rd</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">res</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">DataRow</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Failed to marshal cell: %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">dr</span><span class="p">.</span><span class="nx">Values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">dr</span><span class="p">.</span><span class="nx">Values</span><span class="p">,</span><span class="w"> </span><span class="nx">bs</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dr</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">done</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"SELECT %d"</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">rows</span><span class="p">)))</span>
<span class="p">}</span>
</pre></div>
<p>And we're done with everything but <code>func main()</code>!</p>
<h3 id="main">Main</h3><p>On startup, each process must be assigned (by the parent process) a
unique node id (any unique string is ok) and ports for the Raft
server, Postgres server, and HTTP server. We'll build a short
<code>getConfig</code> helper to grab these from arguments.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">httpPort</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">raftPort</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">pgPort</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">config</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--node-id"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--http-port"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--raft-port"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--pg-port"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">pgPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span>
<span class="w"> </span><span class="nx">i</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --node-id"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --raft-port"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --http-port"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">pgPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Missing required parameter: --pg-port"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cfg</span>
<span class="p">}</span>
</pre></div>
<p>Now in <code>main</code> we'll grab the config and set up this process's
database. All processes will put their data in a top-level <code>data</code>
directory to make managing the directories easier. But within that
directory each process will have their own unique directories for data
storage based on the unique node id.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span>
<span class="w"> </span><span class="nx">dataDir</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"data"</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Could not create data directory: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="s">"/data"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="mo">0600</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Could not open bolt db: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
</pre></div>
<p>We need to clean up the database.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">pe</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newPgEngine</span><span class="p">(</span><span class="nx">db</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Start off in clean state</span>
<span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nb">delete</span><span class="p">()</span>
</pre></div>
<p>Set up the Raft server.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">pf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">pgFsm</span><span class="p">{</span><span class="nx">pe</span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="s">"raft"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="s">"localhost:"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="p">,</span><span class="w"> </span><span class="nx">pf</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Set up the HTTP server.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">hs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">{</span><span class="nx">r</span><span class="p">}</span>
<span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">"/add-follower"</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">addFollowerHandler</span><span class="p">)</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">":"</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}()</span>
</pre></div>
<p>And finally, kick off the Postgres server.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">runPgServer</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">pgPort</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Finally. Finally. Finally done. Let's give it a go. :)</p>
<h3 id="what-hath-god-wrought">What hath god wrought</h3><p>First, initialize the go module and then build the app.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>waterbugdb
$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
$<span class="w"> </span>go<span class="w"> </span>build
</pre></div>
<p>Now in terminal 1 start an instance of the database,</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./waterbugdb<span class="w"> </span>--node-id<span class="w"> </span>node1<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2222</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8222</span><span class="w"> </span>--pg-port<span class="w"> </span><span class="m">6000</span>
</pre></div>
<p>Then in terminal 2 start another instance.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./waterbugdb<span class="w"> </span>--node-id<span class="w"> </span>node2<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2223</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8223</span><span class="w"> </span>--pg-port<span class="w"> </span><span class="m">6001</span>
</pre></div>
<p>And in terminal 3, tell <code>node1</code> to have <code>node2</code> follow it.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">'localhost:8222/add-follower?addr=localhost:2223&id=node2'</span>
</pre></div>
<p>And then open <code>psql</code> against port <code>6000</code>, the leader.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="n">localhost</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6000</span>
<span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6000</span>
<span class="n">psql</span><span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span>
<span class="k">Type</span><span class="w"> </span><span class="ss">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">help</span><span class="p">.</span>
<span class="n">phil</span><span class="o">=></span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="p">(</span><span class="n">age</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">text</span><span class="p">);</span>
<span class="k">CREATE</span><span class="w"> </span><span class="n">ok</span>
<span class="n">phil</span><span class="o">=></span><span class="w"> </span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">values</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span><span class="w"> </span><span class="s1">'garry'</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="s1">'ted'</span><span class="p">);</span>
<span class="n">could</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="n">interpret</span><span class="w"> </span><span class="k">result</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">server</span><span class="p">:</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="n">ok</span>
<span class="k">INSERT</span><span class="w"> </span><span class="n">ok</span>
<span class="n">phil</span><span class="o">=></span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">x</span><span class="p">;</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">age</span><span class="w"> </span>
<span class="c1">---------+-----</span>
<span class="w"> </span><span class="ss">"garry"</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">14</span>
<span class="w"> </span><span class="ss">"ted"</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">20</span>
<span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div>
<p>Now kill <code>node1</code> in terminal 1. Then start it up again. <code>node2</code> will
now be the leader. So exit <code>psql</code> in terminal 3 and enter it again
pointed at <code>node2</code>, port <code>6001</code>. Add new data.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6001</span>
<span class="n">psql</span><span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span>
<span class="k">Type</span><span class="w"> </span><span class="ss">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">help</span><span class="p">.</span>
<span class="n">phil</span><span class="o">=></span><span class="w"> </span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">values</span><span class="p">(</span><span class="mi">19</span><span class="p">,</span><span class="w"> </span><span class="s1">'ava'</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">18</span><span class="p">,</span><span class="w"> </span><span class="s1">'ming'</span><span class="p">);</span>
<span class="n">could</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="n">interpret</span><span class="w"> </span><span class="k">result</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">server</span><span class="p">:</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="n">ok</span>
<span class="n">phil</span><span class="o">=></span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">x</span><span class="p">;</span>
<span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span>
<span class="c1">-----+---------</span>
<span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"ted"</span>
<span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"garry"</span>
<span class="w"> </span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"ming"</span>
<span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"ava"</span>
</pre></div>
<p>Exit <code>psql</code> in terminal 3 and start it up again against <code>node1</code> again,
port <code>6000</code>.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6000</span>
<span class="n">psql</span><span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span>
<span class="k">Type</span><span class="w"> </span><span class="ss">"help"</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">help</span><span class="p">.</span>
<span class="n">phil</span><span class="o">=></span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">x</span><span class="p">;</span>
<span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span>
<span class="c1">-----+---------</span>
<span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"ted"</span>
<span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"garry"</span>
<span class="w"> </span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"ming"</span>
<span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">"ava"</span>
<span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span>
</pre></div>
<p>Nifty stuff.</p>
<h3 id="summary">Summary</h3><p>So on the one hand this was a more complex post than my usual. Each
process needed three servers running. Two of those servers we managed
directly and the Raft server was managed by the Raft library.</p>
<p>On the other hand, we did this all in a really small amount of
code. Yes many edge cases were unhandled and massive amount of SQL was
unhandled. And yes there are tons of inefficiencies like using JSON,
an unstructured format when every table has fixed structure. But
hopefully now you have an idea of how a project like this <em>could be
structured</em>. And there's the beginnings of a framework for filling in
syntax/edge cases over time.</p>
<p>Additionally, the only problem we solved with consensus was
replication, not sharding. This, and it's more complicated cousin
(cross-shard transactions), is truly the special sauce Cockroach
brings.</p>
<p>Read more about building an intuition for sharding, replication, and
distributed consensus
[here](<a href="https://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html">https://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html</a>.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New blog post is up :) Let's build a distributed postgres proof of concept.<a href="https://t.co/Z8BDzF1bUw">https://t.co/Z8BDzF1bUw</a> <a href="https://t.co/aSkOjr9Yrh">pic.twitter.com/aSkOjr9Yrh</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1526598365634605058?ref_src=twsrc%5Etfw">May 17, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/distributed-postgres.htmlTue, 17 May 2022 00:00:00 +0000
- SQLite in Go, with and without cgohttp://notes.eatonphil.com/sqlite-in-go-with-and-without-cgo.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/sqlite-in-go-with-and-without-cgo.htmlThu, 12 May 2022 00:00:00 +0000
- HTML event handler attributes: down the rabbit holehttp://notes.eatonphil.com/event-handler-attributes.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-04-26-event-handler-attributes.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-04-26-event-handler-attributes.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/event-handler-attributes.htmlTue, 26 Apr 2022 00:00:00 +0000
- Interview With Phil of DataStationhttp://notes.eatonphil.com/console-101.html<head>
<meta http-equiv="refresh" content="4;URL='https://console.substack.com/p/console-101'" />
</head><p>This is an external interview. Click
<a href="https://console.substack.com/p/console-101">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/console-101.htmlSun, 17 Apr 2022 00:00:00 +0000
- Surveying SQL parser libraries in a few high-level languageshttp://notes.eatonphil.com/sql-parsers.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-04-11-sql-parsers.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-04-11-sql-parsers.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/sql-parsers.htmlMon, 11 Apr 2022 00:00:00 +0000
- Writing a document database from scratch in Go: Lucene-like filters and indexeshttp://notes.eatonphil.com/documentdb.html<p>In this post we'll write a rudimentary document database from scratch
in Go. In less than 500 lines of code we'll be able to support the
following interactions, inspired by Elasticsearch:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s1">'Content-Type: application/json'</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"name": "Kevin", "age": "45"}'</span><span class="w"> </span>http://localhost:8080/docs
<span class="o">{</span><span class="s2">"body"</span>:<span class="o">{</span><span class="s2">"id"</span>:<span class="s2">"5ac64e74-58f9-4ba4-909e-1d5bf4ddcaa1"</span><span class="o">}</span>,<span class="s2">"status"</span>:<span class="s2">"ok"</span><span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q=name:"Kevin"'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"age"</span>:<span class="w"> </span><span class="s2">"45"</span>,
<span class="w"> </span><span class="s2">"name"</span>:<span class="w"> </span><span class="s2">"Kevin"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="s2">"5ac64e74-58f9-4ba4-909e-1d5bf4ddcaa1"</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">]</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q=age:<50'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"age"</span>:<span class="w"> </span><span class="s2">"45"</span>,
<span class="w"> </span><span class="s2">"name"</span>:<span class="w"> </span><span class="s2">"Kevin"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="s2">"5ac64e74-58f9-4ba4-909e-1d5bf4ddcaa1"</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">]</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
</pre></div>
<p>The latter query, being a range query, will do a full table scan. But
the first query, an exact match, will use an index and be much
faster.</p>
<p class="note">
Document databases in general may be able to support indexes on
ranges but our rudimentary one won't.
<br />
<br />
Furthermore, this post will not implement full text search.
</p><p>All code for this project is <a href="https://github.com/eatonphil/docdb">available on
Github</a>. Let's get started.</p>
<h3 id="server-basics">Server basics</h3><p>Run <code>go mod init</code> and set up <code>main.go</code> with <a href="https://github.com/julienschmidt/httprouter">Julien Schmidt's
httprouter</a>. We'll create
three routes: one for inserting a document, one for retrieving a
document by its id, and one for searching for documents.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"encoding/json"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"net/http"</span>
<span class="w"> </span><span class="s">"github.com/julienschmidt/httprouter"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">{</span><span class="s">"8080"</span><span class="p">}</span>
<span class="w"> </span><span class="nx">router</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">New</span><span class="p">()</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">POST</span><span class="p">(</span><span class="s">"/docs"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">addDocument</span><span class="p">)</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">"/docs"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">searchDocuments</span><span class="p">)</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">"/docs/:id"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">)</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Listening on "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">)</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">":"</span><span class="o">+</span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">,</span><span class="w"> </span><span class="nx">router</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p>Now add the routes:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">addDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Unimplemented"</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">searchDocuments</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Unimplemented"</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">getDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Unimplemented"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>That's good enough for now! Let's think about storage.</p>
<h3 id="storage">Storage</h3><p>If you wanted to do this project fully from scratch you could handle
storage by just writing JSON blobs to disk. Nothing in this project
will be much more complex than just writing JSON to disk and the
equivalent of using <code>ls</code> on the filesystem. I mention this because I
said this project is "from scratch" but I'm going to bring in a
storage engine. My point is that you could easily follow this post and
just read/write directly to disk if you felt strongly.</p>
<p class="note">
Because there were so many folks misconstruing this paragraph, I've
ported this blog post without Pebble as proof :D. You
can <a href="https://github.com/eatonphil/docdb/pull/1">find the
diff here</a>. Took me an hour for the +40/-40 diff that is still
<500 lines of code. You may notice the code basically looks
identical. That's because the storage engine isn't the interesting
part. :)
</p><p>Any storage engine would be fine: direct read/write, SQLite,
PostgreSQL. But we're going to grab a key-value storage engine. I've
used Badger before so I'm going to try out <a href="https://github.com/cockroachdb/pebble">Cockroach Lab's
Pebble</a> this time instead.</p>
<p>Add <code>"github.com/cockroachdb/pebble"</code> to the list of imports. Then
upgrade the <code>server</code> struct to store an instance of a Pebble database.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">DB</span>
<span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="nx">database</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">server</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">server</span><span class="p">{</span><span class="nx">db</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="p">:</span><span class="w"> </span><span class="nx">port</span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">database</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Options</span><span class="p">{})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
</pre></div>
<p>And upgrade main:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="s">"docdb.data"</span><span class="p">,</span><span class="w"> </span><span class="s">"8080"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">router</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">New</span><span class="p">()</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">POST</span><span class="p">(</span><span class="s">"/docs"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">addDocument</span><span class="p">)</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">"/docs"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">searchDocuments</span><span class="p">)</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">"/docs/:id"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">)</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Listening on "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">)</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">":"</span><span class="o">+</span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">,</span><span class="w"> </span><span class="nx">router</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p>In the future these server settings could be user-configurable. For
now they're hard-coded.</p>
<h4 id="storing-data">Storing data</h4><p>When the user sends a JSON document we need to give it a unique ID and
store the ID and document in the database. Since we're using a
key-value storage engine we'll just use the ID as the key and the JSON
document as the value.</p>
<p>To generate the ID we'll use <a href="https://github.com/google/uuid">Google's UUID
package</a>. So make sure to import
<code>"github.com/google/uuid"</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">addDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// New unique id for the document</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">uuid</span><span class="p">.</span><span class="nx">New</span><span class="p">().</span><span class="nx">String</span><span class="p">()</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Set</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Sync</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"id"</span><span class="p">:</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Nothing special: just accept a JSON POST body and store it in the
database, return the generated document id.</p>
<p class="note">
I'm not sure that using UUIDs here is a good idea but it is easier
than keeping track of the number of rows in the database.
</p><p>The <code>jsonResponse</code> helper can be defined as:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">body</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"body"</span><span class="p">:</span><span class="w"> </span><span class="nx">body</span><span class="p">,</span>
<span class="w"> </span><span class="s">"status"</span><span class="p">:</span><span class="w"> </span><span class="s">"ok"</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">data</span><span class="p">[</span><span class="s">"status"</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"error"</span>
<span class="w"> </span><span class="nx">data</span><span class="p">[</span><span class="s">"error"</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">()</span>
<span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Header</span><span class="p">().</span><span class="nx">Set</span><span class="p">(</span><span class="s">"Content-Type"</span><span class="p">,</span><span class="w"> </span><span class="s">"application/json"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: set up panic handler?</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>It's a basic wrapper so that all responses are structured JSON.</p>
<h4 id="retrieving-by-id">Retrieving by ID</h4><p>Before we try to test out inserts, let's get retrieval hooked
up. Inserts return an ID in the HTTP reponse. GETs will grab a
document by ID.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">getDocumentById</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">valBytes</span><span class="p">,</span><span class="w"> </span><span class="nx">closer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Get</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">closer</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">valBytes</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">getDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ps</span><span class="p">.</span><span class="nx">ByName</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocumentById</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"document"</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>We've now got enough in place to test out these basics!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>docdb
$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy
$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>./docdb
<span class="m">2022</span>/03/28<span class="w"> </span><span class="m">19</span>:28:19<span class="w"> </span>Listening<span class="w"> </span>on<span class="w"> </span><span class="m">8080</span>
</pre></div>
<p>Now, in another terminal, insert a document:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s1">'Content-Type: application/json'</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">'{"name": "Kevin", "age": "45"}'</span><span class="w"> </span>http://localhost:8080/docs
<span class="o">{</span><span class="s2">"body"</span>:<span class="o">{</span><span class="s2">"id"</span>:<span class="s2">"c458a3ce-9faf-4431-a058-d9ae2a1651e1"</span><span class="o">}</span>,<span class="s2">"status"</span>:<span class="s2">"ok"</span><span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span>http://localhost:8080/docs/c458a3ce-9faf-4431-a058-d9ae2a1651e1
<span class="o">{</span><span class="s2">"body"</span>:<span class="o">{</span><span class="s2">"document"</span>:<span class="o">{</span><span class="s2">"age"</span>:<span class="s2">"45"</span>,<span class="s2">"name"</span>:<span class="s2">"Kevin"</span><span class="o">}}</span>,<span class="s2">"status"</span>:<span class="s2">"ok"</span><span class="o">}</span>
</pre></div>
<p>Perfect! Now let's implement search.</p>
<h3 id="a-filter-language">A filter language</h3><p>First off we need to pick a filter language. Using a JSON data
structure would be fine. We could require the user POSTs against a
search endpoint so that the POST body contains the JSON filter.</p>
<p>But <a href="https://lucene.apache.org/core/2_9_4/queryparsersyntax.html">Lucene</a> is a pretty simple language and we can implement enough
parts of it easily. The result is more fun.</p>
<p>In our simplification of Lucene there will only be key-value
matches. Field names and field values can be quoted. They must be
quoted if they contain spaces or colons, among other things. Key-value
matches are separated by whitespace. They can only be AND-ed together
and that is done implicitly.</p>
<p>The following are some valid filters in our implementation:</p>
<ul>
<li><code>a:1</code></li>
<li><code>b:fifteen a:<3</code></li>
<li><code>a.b:12</code></li>
<li><code>title:"Which way?"</code></li>
<li><code>" a key 2":tenant</code></li>
<li><code>" flubber ":"blubber "</code></li>
</ul>
<p>Nested paths are specified using JSON path syntax (i.e. <code>a.b</code> would
retrieve <code>4</code> in <code>{"a": {"b": 4, "d": 100}, "c": 8}</code>).</p>
<h3 id="lexing-strings">Lexing strings</h3><p>Both keys and values are lexed as strings. If they start with a quote,
we keep on accumulating all characters until the ending
quote. Otherwise we accumulate until we stop seeing a digit, letter,
or period.</p>
<div class="highlight"><pre><span></span><span class="c1">// Handles either quoted strings or unquoted strings of only contiguous digits and letters</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">lexString</span><span class="p">(</span><span class="nx">input</span><span class="w"> </span><span class="p">[]</span><span class="kt">rune</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">index</span><span class="o">++</span>
<span class="w"> </span><span class="nx">foundEnd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">rune</span>
<span class="w"> </span><span class="c1">// TODO: handle nested quotes</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'"'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">foundEnd</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">])</span>
<span class="w"> </span><span class="nx">index</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">foundEnd</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Expected end of quoted string"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// If unquoted, read as much contiguous digits/letters as there are</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">rune</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="kt">rune</span>
<span class="w"> </span><span class="c1">// TODO: someone needs to validate there's not ...</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!(</span><span class="nx">unicode</span><span class="p">.</span><span class="nx">IsLetter</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">unicode</span><span class="p">.</span><span class="nx">IsDigit</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'.'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="nx">index</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"No string found"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p class="note">
This is not something you get right without unit tests. I wrote unit
tests for it while building this project. Always unit test tricky code
where you're likely to have off-by-one errors! I had a bunch.
</p><h3 id="query-parser">Query parser</h3><p>Now we can write the query parser. It first lexes a string for the
key. Then it looks for the operator which can be one of <code>:</code> (meaning
equality), <code>:></code> (meaning greater than), or <code>:<</code> (meaning less
than). It accumulates each key-value pair into an overall list of
AND-ed arguments that make up the query.</p>
<div class="highlight"><pre><span></span><span class="n">type</span><span class="w"> </span><span class="n">queryComparison</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="p">[]</span><span class="n">string</span>
<span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">string</span>
<span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="n">string</span>
<span class="p">}</span>
<span class="n">type</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ands</span><span class="w"> </span><span class="p">[]</span><span class="n">queryComparison</span>
<span class="p">}</span>
<span class="o">//</span><span class="w"> </span><span class="n">E</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="w"> </span><span class="n">q</span><span class="o">=</span><span class="n">a</span><span class="o">.</span><span class="n">b</span><span class="p">:</span><span class="mi">12</span>
<span class="k">func</span><span class="w"> </span><span class="n">parseQuery</span><span class="p">(</span><span class="n">q</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">query</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">q</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="n">query</span><span class="p">{},</span><span class="w"> </span><span class="n">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">parsed</span><span class="w"> </span><span class="n">query</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">qRune</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span><span class="n">rune</span><span class="p">(</span><span class="n">q</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">qRune</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Eat</span><span class="w"> </span><span class="n">whitespace</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">unicode</span><span class="o">.</span><span class="n">IsSpace</span><span class="p">(</span><span class="n">qRune</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">i</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">qRune</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">"Expected valid key, got [</span><span class="si">%s</span><span class="s2">]: `</span><span class="si">%s</span><span class="s2">`"</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">:])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Expect</span><span class="w"> </span><span class="n">some</span><span class="w"> </span><span class="n">operator</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">':'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">"Expected colon at </span><span class="si">%d</span><span class="s2">, got: `</span><span class="si">%s</span><span class="s2">`"</span><span class="p">,</span><span class="w"> </span><span class="n">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">:])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nextIndex</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="s2">"="</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'>'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'<'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="w"> </span><span class="n">i</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">qRune</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">"Expected valid value, got [</span><span class="si">%s</span><span class="s2">]: `</span><span class="si">%s</span><span class="s2">`"</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">:])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nextIndex</span>
<span class="w"> </span><span class="n">argument</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">queryComparison</span><span class="p">{</span><span class="n">key</span><span class="p">:</span><span class="w"> </span><span class="n">strings</span><span class="o">.</span><span class="n">Split</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="s2">"."</span><span class="p">),</span><span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">op</span><span class="p">:</span><span class="w"> </span><span class="n">op</span><span class="p">}</span>
<span class="w"> </span><span class="n">parsed</span><span class="o">.</span><span class="n">ands</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">parsed</span><span class="o">.</span><span class="n">ands</span><span class="p">,</span><span class="w"> </span><span class="n">argument</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="n">parsed</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span>
<span class="p">}</span>
</pre></div>
<p>Since we're already writing a real lexer we could do better than
<code>strings.Split(key, ".")</code> when it comes to find key path parts. But it
isn't a huge deal at this stage. So we keep it simple.</p>
<h3 id="query-matching">Query matching</h3><p>Now that we've got the query parser we need to implement an evaluator
for the search endpoint. We need to be able to check that given a
document, it meets the filter or not.</p>
<p>So we iterate over each argument and do the indicated comparison:
equality, greater than or less than. If at any point the comparison
fails, return false immediately. Otherwise if we got through all
arguments and didn't return, there was a match!</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">q</span><span class="w"> </span><span class="nx">query</span><span class="p">)</span><span class="w"> </span><span class="nx">match</span><span class="p">(</span><span class="nx">doc</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">argument</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">ands</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getPath</span><span class="p">(</span><span class="nx">doc</span><span class="p">,</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Handle equality</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"="</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%v"</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">value</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">match</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Handle <, ></span>
<span class="w"> </span><span class="nx">right</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseFloat</span><span class="p">(</span><span class="nx">argument</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="kt">float64</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">value</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">float64</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">float32</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint8</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint16</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint32</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint64</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int8</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int16</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int32</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int64</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">string</span><span class="p">:</span>
<span class="w"> </span><span class="nx">left</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseFloat</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">">"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p class="note">
This bit of Go that requires separate case statements for every
possible numeric so I can convert it to float is really annoying.
</p><p>The only additional part to call out in there is <code>getPath</code>. We need to
be able to grab any path within an object since the user could have
made a filter like <code>a.b:12</code>. So let's keep things simple (but less
safe) and implement <code>getPath</code> recursively.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">getPath</span><span class="p">(</span><span class="nx">doc</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">docSegment</span><span class="w"> </span><span class="kt">any</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">doc</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">m</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">docSegment</span><span class="p">.(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">docSegment</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">m</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">docSegment</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>A critical thing to point out is that filtering on arrays is not
supported. Any filter that tries to enter an array will fail or return
no results.</p>
<h3 id="search">Search</h3><p>Now that we've got all the tools in place we can implement the search
endpoint. We'll just iterate over all documents in the database and
return all documents that match the filter.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">searchDocuments</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">q</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseQuery</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"q"</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">[]</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">NewIter</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Valid</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">(),</span><span class="w"> </span><span class="o">&</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"id"</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span>
<span class="w"> </span><span class="s">"body"</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span><span class="s">"documents"</span><span class="p">:</span><span class="w"> </span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="s">"count"</span><span class="p">:</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">documents</span><span class="p">)},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Not bad! Let's try it out:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>./docdb
</pre></div>
<p>And in another terminal, try out the search endpoint with no filter:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"age"</span>:<span class="w"> </span><span class="s2">"45"</span>,
<span class="w"> </span><span class="s2">"name"</span>:<span class="w"> </span><span class="s2">"Kevin"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="s2">"c458a3ce-9faf-4431-a058-d9ae2a1651e1"</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">]</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
</pre></div>
<p>With an equality filter:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q=name:Mel'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">0</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span>null
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q=name:Kevin'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"age"</span>:<span class="w"> </span><span class="s2">"45"</span>,
<span class="w"> </span><span class="s2">"name"</span>:<span class="w"> </span><span class="s2">"Kevin"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="s2">"c458a3ce-9faf-4431-a058-d9ae2a1651e1"</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">]</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
</pre></div>
<p>And with greater than/less than filters:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q=age:<12'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">0</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span>null
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q=age:<200'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"count"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"documents"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"body"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"age"</span>:<span class="w"> </span><span class="s2">"45"</span>,
<span class="w"> </span><span class="s2">"name"</span>:<span class="w"> </span><span class="s2">"Kevin"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="s2">"c458a3ce-9faf-4431-a058-d9ae2a1651e1"</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">]</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"status"</span>:<span class="w"> </span><span class="s2">"ok"</span>
<span class="o">}</span>
</pre></div>
<p>Sweet.</p>
<h3 id="benchmarking">Benchmarking</h3><p>Now let's try inserting a few hundred thousand rows of real-world
data. Grab <code>movies.json</code> from the <a href="https://github.com/prust/wikipedia-movie-data">Wikipedia Movie Data
repo</a>. This dataset
only has 28,000 rows. But we can insert it multiple times. If we
filter by movie name and movie year we'll be looking at only a small
subset of the data but enough that we can get a sense about
performance.</p>
<p>Here's a basic script to ingest that data a bunch of times once you've
downloaded the file.</p>
<div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/env bash</span>
<span class="nb">set</span><span class="w"> </span>-e
<span class="nv">count</span><span class="o">=</span><span class="m">50</span>
<span class="k">for</span><span class="w"> </span>run<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="o">{</span><span class="m">1</span>..50<span class="o">}</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>jq<span class="w"> </span>-c<span class="w"> </span><span class="s1">'.[]'</span><span class="w"> </span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="nb">read</span><span class="w"> </span>data<span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s1">'Content-Type: application/json'</span><span class="w"> </span>-d<span class="w"> </span><span class="s2">"</span><span class="nv">$data</span><span class="s2">"</span><span class="w"> </span>http://localhost:8080/docs
<span class="w"> </span><span class="k">done</span>
<span class="k">done</span>
</pre></div>
<p>Start it up and wait as long as you can. :)</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>chmod<span class="w"> </span>+x<span class="w"> </span>scripts/load_array.sh
$<span class="w"> </span>./scripts/load_array.sh<span class="w"> </span>movies.json
</pre></div>
<p>You can check how many items are in the database like so:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.body.count'</span>
<span class="m">12649</span>
</pre></div>
<p>Once you have a few hundred thousand documents you'll start to notice
exact equality queries start to take longer:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q="year":1918'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.body.count'</span>
<span class="m">1152</span>
curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q="year":1918'</span><span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">0</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.992<span class="w"> </span>total
</pre></div>
<p>And you think: although there are hundreds of thousands of documents,
if I'm just asking for documents with a certain value such that there
are only 1000 documents that match that value, shouldn't it be
possible to grab them more quickly than in one whole second? Or, better
than a time that grows with the number of documents in the database?</p>
<p>Yes. Yes it is possible.</p>
<h3 id="indexes">Indexes</h3><p>Document databases often index everything. We're going to do that. For
every path in a document (that isn't a path within an array) we're
going to store the path and the value of the document at that path.</p>
<p>First we'll open a second database that we'll use to store all of
these path-value pairs.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">DB</span><span class="w"> </span><span class="c1">// Primary data</span>
<span class="w"> </span><span class="nx">indexDb</span><span class="w"> </span><span class="o">*</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">DB</span><span class="w"> </span><span class="c1">// Index data</span>
<span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="nx">database</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">server</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">server</span><span class="p">{</span><span class="nx">db</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="p">:</span><span class="w"> </span><span class="nx">port</span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">database</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Options</span><span class="p">{})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">database</span><span class="o">+</span><span class="s">".index"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Options</span><span class="p">{})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
</pre></div>
<p>Then when we insert, we'll call an <code>index</code> function to generate all
path-value pairs and store them in this second database.</p>
<p>The index database will store the path-value pair as keys. And values
will be the comma separated list of document IDs that have that
path-value pair.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">index</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">pv</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getPathValues</span><span class="p">(</span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">pathValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">pv</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="nx">closer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">pathValue</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">ErrNotFound</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not look up pathvalue [%#v]: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">idsString</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">idsString</span><span class="p">),</span><span class="w"> </span><span class="s">","</span><span class="p">)</span>
<span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">existingId</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">existingId</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">found</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">","</span><span class="o">+</span><span class="nx">id</span><span class="p">)</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">closer</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">closer</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not close: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">.</span><span class="nx">Set</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">pathValue</span><span class="p">),</span><span class="w"> </span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Sync</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Could not update index: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Keeping things simple we'll also implement this <code>getPathValues</code> helper
recursively:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">getPathValues</span><span class="p">(</span><span class="nx">obj</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">pvs</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">obj</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">val</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">:</span>
<span class="w"> </span><span class="nx">pvs</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">pvs</span><span class="p">,</span><span class="w"> </span><span class="nx">getPathValues</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">)</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="p">[]</span><span class="kd">interface</span><span class="p">{}:</span>
<span class="w"> </span><span class="c1">// Can't handle arrays</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"."</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">key</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pvs</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">pvs</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%s=%v"</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pvs</span>
<span class="p">}</span>
</pre></div>
<p>We'll update one line in <code>s.addDocument</code> to call this <code>index</code> function.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">addDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// New unique id for the document</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">uuid</span><span class="p">.</span><span class="nx">New</span><span class="p">().</span><span class="nx">String</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Set</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Sync</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"id"</span><span class="p">:</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>And we'll add a <code>reindex</code> function to be called in <code>main</code> to handle
any documents that were ingested and not indexed (i.e. all the ones we
already inserted).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">reindex</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">NewIter</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Valid</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">(),</span><span class="w"> </span><span class="o">&</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Unable to parse bad document, %s: %s"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span><span class="w"> </span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="s">"docdb.data"</span><span class="p">,</span><span class="w"> </span><span class="s">"8080"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">reindex</span><span class="p">()</span>
<span class="w"> </span><span class="nx">router</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">New</span><span class="p">()</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">POST</span><span class="p">(</span><span class="s">"/docs"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">addDocument</span><span class="p">)</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">"/docs"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">searchDocuments</span><span class="p">)</span>
<span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">"/docs/:id"</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">)</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Listening on "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">)</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">":"</span><span class="o">+</span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">,</span><span class="w"> </span><span class="nx">router</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<h3 id="using-the-index">Using the index</h3><p>When there is an equality filter we can look the equality filter
up in the index database. Our filter language only supports AND-ed
arguments. So the results matching the overall filter must be the set
intersection of ids that match each individual equality
filter. Greater than and less than filters will be filtered out after
fetching all possible ids that match equality filters.</p>
<p>If no ids are found in the index database meeting all equality filters
then we'll fall back to the full table scan we already have.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">searchDocuments</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">q</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseQuery</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"q"</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">isRange</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="nx">idsArgumentCount</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">nonRangeArguments</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">argument</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">ands</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"="</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">nonRangeArguments</span><span class="o">++</span>
<span class="w"> </span><span class="nx">ids</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%s=%v"</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">argument</span><span class="p">.</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="s">"."</span><span class="p">),</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">value</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">idsArgumentCount</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsArgumentCount</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">idsArgumentCount</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">isRange</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">idsArgumentCount</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">count</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">nonRangeArguments</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">idsInAll</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">"skipIndex"</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"true"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">idsInAll</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocumentById</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isRange</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"id"</span><span class="p">:</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span>
<span class="w"> </span><span class="s">"body"</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">NewIter</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Valid</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">(),</span><span class="w"> </span><span class="o">&</span><span class="nx">document</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span>
<span class="w"> </span><span class="s">"id"</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span>
<span class="w"> </span><span class="s">"body"</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span><span class="s">"documents"</span><span class="p">:</span><span class="w"> </span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="s">"count"</span><span class="p">:</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">documents</span><span class="p">)},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>The last unimplemented part is the <code>lookup</code> helper. Given a path-value
pair it checks the database for IDs that match that pair.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">lookup</span><span class="p">(</span><span class="nx">pathValue</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="nx">closer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">pathValue</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">ErrNotFound</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not look up pathvalue [%#v]: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">pathValue</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">closer</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">closer</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">idsString</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">idsString</span><span class="p">),</span><span class="w"> </span><span class="s">","</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>We're done. Finally! Let's build it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>./docdb
</pre></div>
<p>(This is going to take a while; to reindex.)</p>
<p>Once the server is ready we can run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q="year":1918'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">'.body.count'</span>
<span class="m">1280</span>
curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">'q="year":1918'</span><span class="w"> </span><span class="m">0</span>.01s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">29</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.029<span class="w"> </span>total
</pre></div>
<p>Hey that's not bad.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Hey here's a new blog post on writing a document database from scratch with support for Lucene-like queries and basic indexes in less than 500 lines of Go<a href="https://t.co/M3js6Pj9h0">https://t.co/M3js6Pj9h0</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1508546397943046150?ref_src=twsrc%5Etfw">March 28, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/documentdb.htmlMon, 28 Mar 2022 00:00:00 +0000
- Speeding up Go's builtin JSON encoder up to 55% for large arrays of objectshttp://notes.eatonphil.com/improving-go-json-encoding-performance-for-large-arrays-of-objects.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-03-03-improving-go-json-encoding-performance-for-large-arrays-of-objects.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-03-03-improving-go-json-encoding-performance-for-large-arrays-of-objects.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/improving-go-json-encoding-performance-for-large-arrays-of-objects.htmlThu, 03 Mar 2022 00:00:00 +0000
- SMTP protocol basics from scratch in Go: receiving email from Gmailhttp://notes.eatonphil.com/handling-email-from-gmail-smtp-protocol-basics.html<p>I've never run my own mail server before. Before today I had no clue
how email worked under the hood other than the very few times I've set
up mail clients.</p>
<p>I've heard no few times how hard it is to <em>send</em> mail from a
self-hosted server (because of spam filters). But how hard can it be
to hook up DNS to my personal server and receive email to my domain
sent from Gmail or another real-world client?</p>
<p>I knew it would be simpler to just send local mail to a local mail
server with a local mail client but that didn't seem as real. If I
could send email from my Gmail account and receive it in my server I'd
be happy.</p>
<p>I spent the afternoon digging into this. All code is <a href="https://github.com/eatonphil/gomail">available on
Github</a>. The "live stream" is in
the <a href="https://discord.multiprocess.io">Multiprocess Discord</a>'s
#hacking-networks channel.</p>
<h3 id="dns">DNS</h3><p>First I bought a domain. (I needed to be able to mess around with
records without blowing up anything important.)</p>
<p>I knew that MX records controlled where mail for a domain is sent. But
I had to <a href="https://en.wikipedia.org/wiki/MX_record">look up the
specifics</a>. You need to
create an MX record that points to an A or AAAA record. So you need
both an MX record and an A or AAAA record.</p>
<p><img src="/dnsrecords.png" alt="MX and A record settings"></p>
<p>Done.</p>
<h3 id="firewall">Firewall</h3><p>The firewall on Fedora is aggressive. Gotta open up port 25.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>firewall-cmd<span class="w"> </span>--zone<span class="o">=</span>dmz<span class="w"> </span>--add-port<span class="o">=</span><span class="m">25</span>/tcp<span class="w"> </span>--permanent
$<span class="w"> </span>sudo<span class="w"> </span>firewall-cmd<span class="w"> </span>--zone<span class="o">=</span>public<span class="w"> </span>--add-port<span class="o">=</span><span class="m">25</span>/tcp<span class="w"> </span>--permanent
$<span class="w"> </span>sudo<span class="w"> </span>firewall-cmd<span class="w"> </span>--reload
</pre></div>
<p>I don't understand what zones are here.</p>
<h3 id="what-protocols?">What protocols?</h3><p>I knew that you send email with SMTP and you read it with POP3 or
IMAP. But it hadn't clicked before that the mail server has to speak
SMTP and if you only ever read on the server (which is of course
impractical in the real world) you don't need POP3 or IMAP.</p>
<p><img src="https://cdn.educba.com/academy/wp-content/uploads/2019/07/smtp-protocol.png" alt="SMTP vs POP3"></p>
<p>So to meaningfully receive email from Gmail all I needed to do was implement SMTP.</p>
<h3 id="smtp">SMTP</h3><p>First I found the <a href="https://datatracker.ietf.org/doc/html/rfc5321">RFC for
SMTP</a> (or one of them
anyway) and <a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol">the wikipedia page for
it</a>.</p>
<p>First off I'd need to run a TCP server on port 25.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"errors"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"net"</span>
<span class="w"> </span><span class="s">"strconv"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"[ERROR] %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">logInfo</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"[INFO] %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">clientDomain</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">smtpCommands</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">atmHeaders</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">body</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">date</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">subject</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">connection</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">conn</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Conn</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="p">}</span>
<span class="c1">// TODO</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">handle</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Listen</span><span class="p">(</span><span class="s">"tcp"</span><span class="p">,</span><span class="w"> </span><span class="s">"0.0.0.0:25"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Listening"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">Accept</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">connection</span><span class="p">{</span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">}</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">handle</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Just a basic TCP server that passes off connections inside a
goroutine.</p>
<h3 id="greeting">Greeting</h3><p>After starting a connection, the server must send a greeting. The
successful greeting response code is <code>220</code>. It can optionally be
followed by additional text. Like most commands in SMTP it must be
ended with CRLF (<code>\r\n</code>).</p>
<p>So we'll add a helper function for writing lines that end in CRLF:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">writeLine</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"\r\n"</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">msg</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">msg</span><span class="p">[</span><span class="nx">n</span><span class="p">:]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And then we'll send that <code>220</code> in the <code>handle</code> function.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">handle</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Connection accepted"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">"220"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Awaiting EHLO"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// TODO</span>
</pre></div>
<h3 id="ehlo">EHLO</h3><p>Next we need to be able to read requests from the client. We'll write
a helper that reads until the next CRLF. We'll keep a buffer of unread
bytes in case we accidentally get bytes past the next CRLF. We'll
store that buffer in the connection object.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">readLine</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="p">)</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// If end of line</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\r'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// i-1 because drop the CRLF, no one cares after this</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[:</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">:]</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Now back in the <code>handle</code>-er we can read a line from the client. From
the RFC we can see it should be <code>HELO</code> or <code>EHLO</code>. Both sendmail locally
and Gmail only send <code>EHLO</code> though so we'll just check for that.</p>
<p><img src="/ehloresponse.png" alt="EHLO response format"></p>
<p>So we'll validate the message sent is an <code>EHLO</code> and then we'll send
back a <code>250</code> with a space after it. We can ignore the rest of that
response grammar since we don't have additional keywords we want to
send to the client.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logInfo</span><span class="p">(</span><span class="ss">"Awaiting EHLO"</span><span class="p">)</span>
<span class="w"> </span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">readLine</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logError</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="err">!</span><span class="n">strings</span><span class="p">.</span><span class="n">HasPrefix</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="ss">"EHLO"</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logError</span><span class="p">(</span><span class="n">errors</span><span class="p">.</span><span class="k">New</span><span class="p">(</span><span class="ss">"Expected EHLO got: "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">line</span><span class="p">))</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">msg</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">message</span><span class="err">{</span>
<span class="w"> </span><span class="nl">smtpCommands</span><span class="p">:</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="n">string</span><span class="err">{}</span><span class="p">,</span>
<span class="w"> </span><span class="nl">atmHeaders</span><span class="p">:</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="n">string</span><span class="err">{}</span><span class="p">,</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">clientDomain</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">line</span><span class="o">[</span><span class="n">len("EHLO "):</span><span class="o">]</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logInfo</span><span class="p">(</span><span class="ss">"Received EHLO"</span><span class="p">)</span>
<span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">writeLine</span><span class="p">(</span><span class="ss">"250 "</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logError</span><span class="p">(</span><span class="n">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logInfo</span><span class="p">(</span><span class="ss">"Done EHLO"</span><span class="p">)</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">TODO</span>
</pre></div>
<h3 id="additional-commands">Additional commands</h3><p>Next up there are a few commands we need to read before we get to the
message body. These include the recipient and the sender
address. These are formatted vaguely similar to HTTP headers. They
have a key on the left side of a colon and a value on the right. They
may have a required order too, I'm not sure.</p>
<p>In response to the commands we'll send a <code>250 OK</code>, although I'm not
sure where in the RFC that is suggested.</p>
<p>In our code we'll just keep looking for these commands until we find
the <code>DATA</code> command. This indicates the body is to follow. And to this
command we respond with a <code>354</code> instead of a <code>250 OK</code>.</p>
<p><img src="/dataresponse.png" alt="DATA response"></p>
<p>In code:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Done EHLO"</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">readLine</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pieces</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">SplitN</span><span class="p">(</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="s">":"</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nx">smtpCommand</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToUpper</span><span class="p">(</span><span class="nx">pieces</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="w"> </span><span class="c1">// Special command without a value</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">smtpCommand</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"DATA"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">"354"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">smtpValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pieces</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">smtpCommands</span><span class="p">[</span><span class="nx">smtpCommand</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">smtpValue</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Got command: "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">line</span><span class="p">)</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">"250 OK"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Done SMTP commands, reading ARPA text message headers"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// TODO</span>
</pre></div>
<h3 id="message-body,-headers">Message body, headers</h3><p>Now that we've seen the <code>DATA</code> command we are within <em>a</em> message
body. Within this body we still have to read some additional headers.</p>
<p>Through trial-and-error I know to look for some headers like
<code>Subject</code>. By searching the RFC I noticed a reference to <a href="https://datatracker.ietf.org/doc/html/rfc822">RFC
822</a> where these headers
are defined.</p>
<p><img src="/subject.png" alt="ARPA text message headers"></p>
<p>These are ARPA internet text message headers apparently. They also
look like HTTP headers but unlike HTTP headers they can span multiple
lines. This stumped me for a bit.</p>
<p><img src="/longheaders.png" alt="Multi-line headers"></p>
<p>I decided to write a new <code>readLine</code> function that would specifically
look for these possibly multi-line headers where a CRLF followed by
whitespace isn't a line delimiter.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">readMultiLine</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">noMoreReads</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">' '</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'\t'</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\r'</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// i-2 because drop the CRLF, no one cares after this</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[:</span><span class="nx">i</span><span class="o">-</span><span class="mi">2</span><span class="p">])</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="p">:]</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">noMoreReads</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">isBodyClose</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">noMoreReads</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="p">)</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// If this gets here more than once it's going to be an infinite loop</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">isBodyClose</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">4</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\r'</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'.'</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\r'</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span>
<span class="p">}</span>
</pre></div>
<p>Now back in the <code>handle</code> function we can read through all of these
headers. According to RFC 822, we're done when we see a double CRLF,
which in our code will show up as an empty line.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Done SMTP headers, reading ARPA text message headers"</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">readMultiLine</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">TrimSpace</span><span class="p">(</span><span class="nx">line</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">pieces</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">SplitN</span><span class="p">(</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="s">": "</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToUpper</span><span class="p">(</span><span class="nx">pieces</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="w"> </span><span class="nx">atmValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pieces</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">atmHeaders</span><span class="p">[</span><span class="nx">atmHeader</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"SUBJECT"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">subject</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"TO"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">to</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"FROM"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"DATE"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">date</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Done ARPA text message headers, reading body"</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// TODO</span>
</pre></div>
<h3 id="body,-for-real">Body, for real</h3><p>We're finally at the email body as the user typed it. According to the
SMTP RFC the body ends with a CRLF followed by a dot (period) followed
by a CRLF.</p>
<p>So we'll write another helper to read until this marker.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">readToEndOfBody</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">isBodyClose</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[:</span><span class="nx">i</span><span class="o">-</span><span class="mi">4</span><span class="p">]),</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="p">)</span>
<span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And we can finish up the <code>handle</code> function.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Done ARPA text message headers, reading body"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">readToEndOfBody</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Got body (%d bytes)"</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">body</span><span class="p">))</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">"250 OK"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Message:\n%s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">body</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">"Connection closed"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h3 id="compile,-setcap,-run,-and-send">Compile, setcap, run, and send</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>sudo<span class="w"> </span>setcap<span class="w"> </span><span class="s1">'cap_net_bind_service=+ep'</span><span class="w"> </span>./gomail
$<span class="w"> </span>./gomail
</pre></div>
<p>And send an email in Gmail! It can be to any user since we haven't
implemented anything regarding users. I'll send <code>What hath god
wrought</code> as the subject and message to <code>[email protected]</code>.</p>
<p>And I see:</p>
<div class="highlight"><pre><span></span><span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:17:19<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Listening
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Connection<span class="w"> </span>accepted
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Awaiting<span class="w"> </span>EHLO
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Received<span class="w"> </span>EHLO
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Done<span class="w"> </span>EHLO
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Got<span class="w"> </span>header:<span class="w"> </span>MAIL<span class="w"> </span>FROM:<[email protected]>
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Got<span class="w"> </span>header:<span class="w"> </span>RCPT<span class="w"> </span>TO:<[email protected]>
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Done<span class="w"> </span>SMTP<span class="w"> </span>headers,<span class="w"> </span>reading<span class="w"> </span>ARPA<span class="w"> </span>text<span class="w"> </span>message<span class="w"> </span>headers
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Done<span class="w"> </span>ARPA<span class="w"> </span>text<span class="w"> </span>message<span class="w"> </span>headers,<span class="w"> </span>reading<span class="w"> </span>body
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Got<span class="w"> </span>body<span class="w"> </span><span class="o">(</span><span class="m">256</span><span class="w"> </span>bytes<span class="o">)</span>
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Message:
--000000000000c4758905d87ddb81
Content-Type:<span class="w"> </span>text/plain<span class="p">;</span><span class="w"> </span><span class="nv">charset</span><span class="o">=</span><span class="s2">"UTF-8"</span>
What<span class="w"> </span>hath<span class="w"> </span>god<span class="w"> </span>wrought
--000000000000c4758905d87ddb81
Content-Type:<span class="w"> </span>text/html<span class="p">;</span><span class="w"> </span><span class="nv">charset</span><span class="o">=</span><span class="s2">"UTF-8"</span>
<div<span class="w"> </span><span class="nv">dir</span><span class="o">=</span><span class="s2">"ltr"</span>>What<span class="w"> </span>hath<span class="w"> </span>god<span class="w"> </span>wrought</div>
--000000000000c4758905d87ddb81--
<span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Connection<span class="w"> </span>closed
</pre></div>
<p>Which is pretty sweet!</p>
<h3 id="multipart-wut">Multipart wut</h3><p>Ok this body still clearly has some format. And if we dump the ARPA
text message headers we notice that Gmail 1) sets a Content-Type
header and 2) it's value is <code>multipart/alternative</code>. I don't know
where Content-Type as a valid header is defined because it's not in
RFC 822. Maybe it's some "new-fangled" adhoc standard or maybe it's
just in an extension RFC.</p>
<p>In any case this looks like multipart bodies in HTTP. I don't want to
deal with that so I'm just going to stop here.</p>
<p>But I <em>am</em> curious about text-only email systems. So I <code>sudo dnf
install php sendmail</code> and write a quick PHP script (thanks to @Josh on
Discord for the suggestion):</p>
<div class="highlight"><pre><span></span><span class="cp"><?php</span>
<span class="nb">mail</span><span class="p">(</span><span class="s2">"[email protected]"</span><span class="p">,</span> <span class="s2">"What hath god wrought"</span><span class="p">,</span> <span class="s2">"What hath god wrought"</span><span class="p">,</span> <span class="s2">""</span><span class="p">);</span>
<span class="cp">?></span>
</pre></div>
<p>And run it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>php<span class="w"> </span>test.php
</pre></div>
<p>And in my <code>gomail</code> window I see:</p>
<div class="highlight"><pre><span></span><span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">17</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="n">Listening</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="k">Connection</span><span class="w"> </span><span class="n">accepted</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Awaiting</span><span class="w"> </span><span class="n">EHLO</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Received</span><span class="w"> </span><span class="n">EHLO</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">EHLO</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Got</span><span class="w"> </span><span class="nl">header</span><span class="p">:</span><span class="w"> </span><span class="n">MAIL</span><span class="w"> </span><span class="k">From</span><span class="err">:</span><span class="o"><</span><span class="n">phil</span><span class="nv">@dev1</span><span class="p">.</span><span class="n">eatonphil</span><span class="p">.</span><span class="n">com</span><span class="o">></span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Got</span><span class="w"> </span><span class="nl">header</span><span class="p">:</span><span class="w"> </span><span class="n">RCPT</span><span class="w"> </span><span class="k">To</span><span class="err">:</span><span class="o"><</span><span class="n">morse</span><span class="nv">@binutils</span><span class="p">.</span><span class="n">org</span><span class="o">></span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">SMTP</span><span class="w"> </span><span class="n">headers</span><span class="p">,</span><span class="w"> </span><span class="n">reading</span><span class="w"> </span><span class="n">ARPA</span><span class="w"> </span><span class="nc">text</span><span class="w"> </span><span class="n">message</span><span class="w"> </span><span class="n">headers</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">ARPA</span><span class="w"> </span><span class="nc">text</span><span class="w"> </span><span class="n">message</span><span class="w"> </span><span class="n">headers</span><span class="p">,</span><span class="w"> </span><span class="n">reading</span><span class="w"> </span><span class="n">body</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Got</span><span class="w"> </span><span class="n">body</span><span class="w"> </span><span class="p">(</span><span class="mi">21</span><span class="w"> </span><span class="n">bytes</span><span class="p">)</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="nl">Message</span><span class="p">:</span>
<span class="n">What</span><span class="w"> </span><span class="n">hath</span><span class="w"> </span><span class="n">god</span><span class="w"> </span><span class="n">wrought</span>
<span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="k">Connection</span><span class="w"> </span><span class="n">closed</span>
</pre></div>
<p>And I'm happy to call it a night.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post on building an SMTP server from scratch in Go that is correctly enough hooked up you can receive emails sent from Gmail to it!<br><br>Good fun and some learning too.<a href="https://t.co/8pYkkAbFnI">https://t.co/8pYkkAbFnI</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1495586245896028160?ref_src=twsrc%5Etfw">February 21, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
<p>p.s. if you want to see more networking software/hardware internals
check out
<a href="https://reddit.com/r/networkdevelopment">/r/NetworkDevelopment</a>.</p>
http://notes.eatonphil.com/handling-email-from-gmail-smtp-protocol-basics.htmlSun, 20 Feb 2022 00:00:00 +0000
- The world of PostgreSQL wire compatibilityhttp://notes.eatonphil.com/the-world-of-postgresql-wire-compatibility.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-02-08-the-world-of-postgresql-wire-compatibility.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-02-08-the-world-of-postgresql-wire-compatibility.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/the-world-of-postgresql-wire-compatibility.htmlTue, 08 Feb 2022 00:00:00 +0000
- How to recommend books, or, stop recommending SICPhttp://notes.eatonphil.com/recommending-a-book.html<p>Many "must-read" books are not well-written. I <a href="https://www.goodreads.com/user/show/50930981-phil-eaton">try to read a
lot</a>, but I
still have a low tolerance for bad writing and bad editing. I write
this post both to discourage thoughtless recommendations and to
encourage the receivers of bad recommendations.</p>
<p>For software developers, Structure and Interpretation of Computer
Programs is a prime example. Written for freshman at MIT, it is
ostensibly an entry-level text. But it requires such a level of
competence in math and physics, and the prose itself is so dense and
archaic, that I couldn't imagine suggesting it to anyone.</p>
<p>And yet it is one of the most recommended books for developers.</p>
<p>This is not to say that SICP is a bad book or that you should not read
it. I just don't think it should ever be suggested to anyone.</p>
<h3 id="goal">Goal</h3><p>The core goal of a book recommendation is for the reader to get
enjoyment or education from it. If you can't continue or finish a
book, you get nothing from it.</p>
<p>You, the recommender, diminish your impact if you can only recommend
books that people won't continue or finish.</p>
<h4 id="non-goal">Non-goal</h4><p>Some people have the capacity to read and love challenging books. If
that is you, you are not the audience of this post. I don't think
you'd disagree that most people are not like you.</p>
<h3 id="why">Why</h3><p>I have a few, not-mutually-exclusive guesses why "must-read" books are
often poorly written.</p>
<p>One guess is intelligence signalling. That it is human nature for a
person to suggest a book in an attempt make herself look smart rather
than to best assist the person asking for a suggestion.</p>
<p>Another guess is that most people don't read enough to have a good
feel for better or worse writing and editing.</p>
<p>And a final guess is that books that are worth reading might not
always be well-written. This is the most unfortunate guess of all. I
don't disagree that sometimes it is necessary to learn from
poorly-written books. But I begrudge this because of how much joy I
get from reading well-written books, fiction and non-fiction.</p>
<p>I have a feeling my guesses apply to recommendations in
general: music, art, film, musicals, restaurants, etc.</p>
<h3 id="instead">Instead</h3><p>My suggestion then to folks who are in the position of giving
recommendations:</p>
<ol>
<li>If you had a hard time reading a book or it took you too long to read it (yes, this threshold is different for everyone), don't recommend it</li>
<li>Don't be scared to recommend nothing, or to recommend against (rather than for) a certain book</li>
<li>Read more books</li>
</ol>
<p>And definitely don't recommend books you haven't read.</p>
<h3 id="mea-culpa">Mea culpa</h3><p>I've definitely done a bad job recommending books in the past,
including recommending books I haven't read. I've been trying to do
better in the last 5 years or so.</p>
<p>What do you think?</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post: a bit of flame bait on how to recommend books and why so many must-read books are impossible to read.<br><br>Or: stop recommending SICP.<br><br>If you love challenging books, you are neither the norm nor the audience of this post. 😀<a href="https://t.co/ZU92kgr4Kf">https://t.co/ZU92kgr4Kf</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1488204810541219840?ref_src=twsrc%5Etfw">January 31, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/recommending-a-book.htmlMon, 31 Jan 2022 00:00:00 +0000
- Bootloader basicshttp://notes.eatonphil.com/bootloader-basics.html<p>I spent a few days playing around with bootloaders for the first
time. This post builds up to a text editor with a few keyboard
shortcuts. I'll be giving a virtual talk based on this work at <a href="https://www.meetup.com/hackernights/">Hacker
Nights on Jan 27</a>.</p>
<p>There are a definitely bugs. But it's hard to find intermediate
resources for bootloader programming so maybe parts of this will be
useful.</p>
<p>If you already know the basics and the intermediates and just want a
fantastic intermediate+ tutorial, maybe try
<a href="https://0x00sec.org/t/realmode-assembly-writing-bootable-stuff-part-5/3667">this</a>. It
is very good.</p>
<p>The code on this post is available on
<a href="https://github.com/eatonphil/bootloaders">Github</a>, but it's more of a
mess than my usual project.</p>
<h3 id="motivation:-snake">Motivation: Snake</h3><p>You remember <a href="https://www.quaxio.com/bootloader_retro_game_tweet/">snake bootloader in a
tweet</a> from a few
years ago?</p>
<p>Install qemu (on macOS or Linux), nasm, and copy the <code>snake.asm</code>
source code to disk from that blog post.</p>
<div class="highlight"><pre><span></span><span class="nf">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">snake.asm</span>
<span class="w"> </span><span class="err">[</span><span class="k">bits</span><span class="w"> </span><span class="mi">16</span><span class="p">]</span><span class="w"> </span><span class="c1">; Pragma, tells the assembler that we</span>
<span class="w"> </span><span class="c1">; are in 16 bit mode (which is the state</span>
<span class="w"> </span><span class="c1">; of x86 when booting from a floppy).</span>
<span class="w"> </span><span class="err">[</span><span class="k">org</span><span class="w"> </span><span class="mh">0x7C00</span><span class="p">]</span><span class="w"> </span><span class="c1">; Pragma, tell the assembler where the</span>
<span class="w"> </span><span class="c1">; code will be loaded.</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">bl</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; Starting direction for the worm.</span>
<span class="w"> </span><span class="nf">push</span><span class="w"> </span><span class="mh">0xa000</span><span class="w"> </span><span class="c1">; Load address of VRAM into es.</span>
<span class="w"> </span><span class="nf">pop</span><span class="w"> </span><span class="nb">es</span>
<span class="nl">restart_game:</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">si</span><span class="p">,</span><span class="w"> </span><span class="mi">320</span><span class="o">*</span><span class="mi">100</span><span class="o">+</span><span class="mi">160</span><span class="w"> </span><span class="c1">; worm's starting position, center of</span>
<span class="w"> </span><span class="c1">; screen</span>
<span class="w"> </span><span class="c1">; Set video mode. Mode 13h is VGA (1 byte per pixel with the actual</span>
<span class="w"> </span><span class="c1">; color stored in a palette), 320x200 total size. When restarting,</span>
<span class="w"> </span><span class="c1">; this also clears the screen.</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0013</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="c1">; Draw borders. We assume the default palette will work for us.</span>
<span class="w"> </span><span class="c1">; We also assume that starting at the bottom and drawing 2176 pixels</span>
<span class="w"> </span><span class="c1">; wraps around and ends up drawing the top + bottom borders.</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">di</span><span class="p">,</span><span class="w"> </span><span class="mi">320</span><span class="o">*</span><span class="mi">199</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">cx</span><span class="p">,</span><span class="w"> </span><span class="mi">2176</span>
<span class="w"> </span><span class="nf">rep</span>
<span class="nl">draw_loop:</span>
<span class="w"> </span><span class="nf">stosb</span><span class="w"> </span><span class="c1">; draw right border</span>
<span class="w"> </span><span class="nf">stosb</span><span class="w"> </span><span class="c1">; draw left border</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">di</span><span class="p">,</span><span class="w"> </span><span class="mi">318</span>
<span class="w"> </span><span class="nf">jnc</span><span class="w"> </span><span class="nv">draw_loop</span><span class="w"> </span><span class="c1">; notice the jump in the middle of the</span>
<span class="w"> </span><span class="c1">; rep stosb instruction.</span>
<span class="nl">game_loop:</span>
<span class="w"> </span><span class="c1">; We read the keyboard input from port 0x60. This also reads bytes from</span>
<span class="w"> </span><span class="c1">; the mouse, so we need to only handle [up (0x48), left (0x4b),</span>
<span class="w"> </span><span class="c1">; right (0x4d), down (0x50)]</span>
<span class="w"> </span><span class="nf">in</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x60</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x48</span>
<span class="w"> </span><span class="nf">jb</span><span class="w"> </span><span class="nv">kb_handle_end</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x50</span>
<span class="w"> </span><span class="nf">ja</span><span class="w"> </span><span class="nv">kb_handle_end</span>
<span class="w"> </span><span class="c1">; At the end bx contains offset displacement (+1, -1, +320, -320)</span>
<span class="w"> </span><span class="c1">; based on pressed/released keypad key. I bet there are a few bytes</span>
<span class="w"> </span><span class="c1">; to shave around here given the bounds check above.</span>
<span class="w"> </span><span class="nf">aaa</span>
<span class="w"> </span><span class="nf">cbw</span>
<span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">jc</span><span class="w"> </span><span class="nv">kb_handle</span>
<span class="w"> </span><span class="nf">sub</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span>
<span class="w"> </span><span class="nf">imul</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="kt">byte</span><span class="w"> </span><span class="o">-</span><span class="mh">0x50</span>
<span class="nl">kb_handle:</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">bx</span><span class="p">,</span><span class="w"> </span><span class="nb">ax</span>
<span class="nl">kb_handle_end:</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">si</span><span class="p">,</span><span class="w"> </span><span class="nb">bx</span>
<span class="w"> </span><span class="c1">; The original code used set pallete command (10h/0bh) to wait for</span>
<span class="w"> </span><span class="c1">; the vertical retrace. Today's computers are however too fast, so</span>
<span class="w"> </span><span class="c1">; we use int 15h 86h instead. This also shaves a few bytes.</span>
<span class="w"> </span><span class="c1">; Note: you'll have to tweak cx+dx if you are running this on a virtual</span>
<span class="w"> </span><span class="c1">; machine vs real hardware. Casual testing seems to show that virtual machines</span>
<span class="w"> </span><span class="c1">; wait ~3-4x longer than physical hardware.</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x86</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">dh</span><span class="p">,</span><span class="w"> </span><span class="mh">0xef</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x15</span>
<span class="w"> </span><span class="c1">; Draw worm and check for collision with parity</span>
<span class="w"> </span><span class="c1">; (even parity=collision).</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x45</span>
<span class="w"> </span><span class="nf">xor</span><span class="w"> </span><span class="p">[</span><span class="nb">es</span><span class="p">:</span><span class="nb">si</span><span class="p">],</span><span class="w"> </span><span class="nb">ah</span>
<span class="w"> </span><span class="c1">; Go back to the main game loop.</span>
<span class="w"> </span><span class="nf">jpo</span><span class="w"> </span><span class="nv">game_loop</span>
<span class="w"> </span><span class="c1">; We hit a wall or the worm. Restart the game.</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">restart_game</span>
<span class="kd">TIMES</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; Fill the rest of sector with 0</span>
<span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; Boot signature at the end of bootloader</span>
</pre></div>
<p>Now run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>snake.asm<span class="w"> </span>-o<span class="w"> </span>snake.bin
$<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>snake.bin
</pre></div>
<p><img src="bootloader-basics-snake.gif" alt="Recording of snake bootloader"></p>
<p>What a phenomenal hack.</p>
<p>I'm not going to get anywhere near that level of sophistication in
this post but I think it's great motivation.</p>
<h3 id="hello-world">Hello world</h3><p>Bootloaders are a mix of assembly programming and BIOS APIs for
I/O. Since you're thinking about bootloaders you already know assembly
basics. Now all you have to do is learn the APIs.</p>
<p>The hello world bootloader has been explained in detail (see
<a href="https://github.com/briansteffens/briansteffens.github.io/blob/master/blog/hello-world-from-a-bootloader/post.md">here</a>,
<a href="https://www.ired.team/miscellaneous-reversing-forensics/windows-kernel-internals/writing-a-custom-bootloader">here</a>,
and <a href="http://3zanders.co.uk/2017/10/13/writing-a-bootloader/">here</a>) so
I won't go into too much line-by-line depth.</p>
<p>In fact, let's just pull the code from the latter blog post.</p>
<div class="highlight"><pre><span></span><span class="nf">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">hello.asm</span>
<span class="k">bits</span><span class="w"> </span><span class="mi">16</span><span class="w"> </span><span class="c1">; tell NASM this is 16 bit code</span>
<span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span><span class="w"> </span><span class="c1">; tell NASM to start outputting stuff at offset 0x7c00</span>
<span class="nl">boot:</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">si</span><span class="p">,</span><span class="nv">hello</span><span class="w"> </span><span class="c1">; point si register to hello label memory location</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="mh">0x0e</span><span class="w"> </span><span class="c1">; 0x0e means 'Write Character in TTY mode'</span>
<span class="nl">.loop:</span>
<span class="w"> </span><span class="nf">lodsb</span>
<span class="w"> </span><span class="nf">or</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="nb">al</span><span class="w"> </span><span class="c1">; is al == 0 ?</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">halt</span><span class="w"> </span><span class="c1">; if (al == 0) jump to halt label</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span><span class="w"> </span><span class="c1">; runs BIOS interrupt 0x10 - Video Services</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span>
<span class="nl">halt:</span>
<span class="w"> </span><span class="nf">cli</span><span class="w"> </span><span class="c1">; clear interrupt flag</span>
<span class="w"> </span><span class="nf">hlt</span><span class="w"> </span><span class="c1">; halt execution</span>
<span class="nl">hello:</span><span class="w"> </span><span class="kd">db</span><span class="w"> </span><span class="s">"Hello world!"</span><span class="p">,</span><span class="mi">0</span>
<span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span>
<span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span>
</pre></div>
<p>The computer boots, prints "Hello world!" and hangs.</p>
<p>But aside from clerical settings (16-bit assembly, where the program
exists in memory, padding to 512 bytes) the only real bootloader-y
magic in there is <code>int 0x10</code>, a BIOS interrupt.</p>
<h4 id="bios-interrupts-=-api-calls-for-i/o">BIOS interrupts = API calls for I/O</h4><p>BIOS interrupts are API calls. Just like syscalls in userland programs
they have a specific register convention and number to call for the
family of APIs.</p>
<p>When you write bootloader programs you'll spend most of your time at
first trying to understand the behavior of the various BIOS APIs.</p>
<p>The two families we'll deal with in this post are the keyboard family
(documentation <a href="https://stanislavs.org/helppc/int_16.html">here</a>) and
the display family (documentation
<a href="https://stanislavs.org/helppc/int_10.html">here</a>).</p>
<h4 id="run-hello-world">Run hello world</h4><p>Anyway, back to the hello world. Assemble it with nasm and run it with
qemu.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>hello.asm<span class="w"> </span>-o<span class="w"> </span>hello.bin
$<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>hello.bin
</pre></div>
<p><img src="bootloader-basics-hello.gif" alt="Printing hello world"></p>
<p>Getting the hang of it?</p>
<h3 id="io-loop">IO Loop</h3><p>The specific function we called above to write a character to the
display is <a href="https://stanislavs.org/helppc/int_10-e.html">INT
10,E</a>. The <code>0x10</code>
is the argument that you call the <code>int</code> keyword with
(e.g. <code>int 0x10</code>). And the <code>E</code> is the specific
function within the <code>0x10</code> family. The <code>E</code> is
written into the <code>AH</code> register before
calling <code>int</code>. The ASCII code to be written is placed in
the <code>AL</code> register.</p>
<p>Now that output makes some sense, let's do input. In the <a href="https://stanislavs.org/helppc/int_16.html">keyboard services
documentation</a> you may
notice that <a href="https://stanislavs.org/helppc/int_16-0.html">INT 16,0</a>
provides a way to block for user input. According to that page the
ASCII character will be in <code>AL</code> when the interrupt returns.</p>
<h4 id="clearing-the-screen">Clearing the screen</h4><p>You may have noticed some text gets displayed before our program
runs. We can use <a href="https://stanislavs.org/helppc/int_10-0.html">INT
0x10,0</a> to clear the
screen.</p>
<div class="highlight"><pre><span></span> ;; Clear screen
mov ah, 0x00
mov al, 0x03
int 0x10
</pre></div>
<h4 id="all-together">All together</h4><p>Since the display function reads from the same register the input
function outputs to, we can just call both interrupts after each
other. Wrap this in a loop and we have the world's worst editor.</p>
<div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="n">ioloop</span><span class="o">.</span><span class="n">asm</span>
<span class="n">bits</span><span class="w"> </span><span class="mi">16</span>
<span class="n">org</span><span class="w"> </span><span class="mh">0x7c00</span>
<span class="n">main</span><span class="p">:</span>
<span class="w"> </span><span class="p">;;</span><span class="w"> </span><span class="n">Clear</span><span class="w"> </span><span class="n">screen</span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="o">.</span><span class="n">loop</span><span class="p">:</span>
<span class="w"> </span><span class="p">;;</span><span class="w"> </span><span class="n">Read</span><span class="w"> </span><span class="n">character</span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="mh">0x16</span>
<span class="w"> </span><span class="p">;;</span><span class="w"> </span><span class="n">Print</span><span class="w"> </span><span class="n">character</span>
<span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span>
<span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="n">jmp</span><span class="w"> </span><span class="o">.</span><span class="n">loop</span>
<span class="n">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="o">$-$$</span><span class="p">)</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="n">pad</span><span class="w"> </span><span class="n">remaining</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="n">bytes</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">zeroes</span>
<span class="n">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="n">magic</span><span class="w"> </span><span class="n">bootloader</span><span class="w"> </span><span class="n">magic</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">marks</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="mi">512</span><span class="w"> </span><span class="n">byte</span><span class="w"> </span><span class="n">sector</span><span class="w"> </span><span class="n">bootable</span><span class="o">!</span>
</pre></div>
<p class="note">
By the way, the <code>main</code> label here (like
the <code>boot</code> label above in <code>hello.asm</code>) is only
to help the reader. It is not something the BIOS uses.
</p><p>Now that we've got the code, let's run it!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>ioloop.asm<span class="w"> </span>-o<span class="w"> </span>ioloop.bin
$<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>ioloop.bin
</pre></div>
<p><img src="bootloader-basics-ioloop.gif" alt="Recording of ioloop bootloader"></p>
<h3 id="digression-on-abstraction">Digression on abstraction</h3><p>There are two ways to build abstractions: assembly functions and nasm
macros.</p>
<p>We could build a clear screen function like this:</p>
<div class="highlight"><pre><span></span><span class="nl">clear_screen:</span>
<span class="w"> </span><span class="c1">;; Clear screen</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="nf">ret</span>
</pre></div>
<p>And then we can call this in the ioloop program like so:</p>
<div class="highlight"><pre><span></span><span class="k">bits</span><span class="w"> </span><span class="mi">16</span>
<span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span>
<span class="nf">jmp</span><span class="w"> </span><span class="nv">main</span>
<span class="nl">clear_screen:</span>
<span class="w"> </span><span class="c1">;; Clear screen</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="nf">ret</span>
<span class="nl">main:</span>
<span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">clear_screen</span>
<span class="nl">.loop:</span>
<span class="w"> </span><span class="c1">;; Read character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x16</span>
<span class="w"> </span><span class="c1">;; Print character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span>
<span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span>
<span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span>
</pre></div>
<p>On the other hand if you do it in a macro:</p>
<div class="highlight"><pre><span></span><span class="k">bits</span><span class="w"> </span><span class="mi">16</span>
<span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span>
<span class="nf">jmp</span><span class="w"> </span><span class="nv">main</span>
<span class="cp">%macro cls 0 </span><span class="c1">; Zero is the number of arguments</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
<span class="nl">main:</span>
<span class="w"> </span><span class="nf">cls</span>
<span class="nl">.loop:</span>
<span class="w"> </span><span class="c1">;; Read character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x16</span>
<span class="w"> </span><span class="c1">;; Print character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span>
<span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span>
<span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span>
</pre></div>
<p>And nasm macros even have a way to write macro-safe labels by
prefixing them with <code>%%</code> which is useful if you have
conditions or loops within a macro.</p>
<p>The benefit of a macro I guess is that you're not using up the
stack. The benefit of a function call is that you're not duplicating
code every place you use a macro. The amount of generated code
eventually becomes important in bootloaders because the code must
fit into 512 bytes.</p>
<p>I lean more toward using macros in this code.</p>
<h3 id="complex-input">Complex input</h3><p>Reading ASCII characters is not complicated as we saw above. But what
if we want to build Readline style shortcuts like ctrl-a for jumping
to the start of the line?</p>
<p>Using INT 16,0 as we do above is fine. But rather than solely reading
from the result of that function call, there is a section of memory
that contains both the character pressed and control characters
pressed.</p>
<p>Based on documentation for this memory area (found
<a href="http://www.techhelpmanual.com/93-rom_bios_variables.html">here</a> or
<a href="https://www.tau.ac.il/~flaxer/edu/course/processcontrol/BiosDataArea.pdf">here</a>),
we can build a macro for reading the pressed character:</p>
<div class="highlight"><pre><span></span><span class="cp">%macro mov_read_character_into 1</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x041a</span><span class="p">]</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03fe</span><span class="w"> </span><span class="c1">; Offset from 0x0400 - sizeof(uint16) (since head points to next free slot, not last/current slot)</span>
<span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFFFF</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="nb">eax</span><span class="p">]</span>
<span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFF</span>
<span class="cp">%endmacro</span>
</pre></div>
<p>And another macro for reading the pressed control character (if any):</p>
<div class="highlight"><pre><span></span><span class="cp">%macro mov_read_ctrl_flag_into 1</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x0417</span><span class="p">]</span>
<span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0x04</span><span class="w"> </span><span class="c1">; Grab 3rd bit: %1 & 0b0100</span>
<span class="cp">%endmacro</span>
</pre></div>
<h3 id="cursor-location">Cursor location</h3><p>Lastly we'll use some cursor APIs that allow us to handle
newlines, backspace on the first column of a line, and ctrl-a (jump to
beginning of line).</p>
<div class="highlight"><pre><span></span><span class="cp">%macro get_position 0</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
<span class="cp">%macro set_position 0</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x02</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
</pre></div>
<p>But there's something buggy about my <code>goto_end_of_line</code>
function. Sometimes it works and sometimes it just jumps all over the
screen in an infinite loop. Part of the problem is that the editor
memory is the video card. The cursor location is only stored there and
not in some program state like you might do in a high-level
environment/language.</p>
<div class="highlight"><pre><span></span><span class="nl">goto_end_of_line:</span>
<span class="c1">;; Get current character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x08</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="c1">;; Iterate until the character is null</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span>
<span class="w"> </span><span class="nf">inc</span><span class="w"> </span><span class="nb">dl</span>
<span class="w"> </span><span class="nf">set_position</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">goto_end_of_line</span>
<span class="nl">.done:</span>
<span class="w"> </span><span class="nf">ret</span>
</pre></div>
<p>Alright, let's put all these pieces together.</p>
<h3 id="editor-with-keyboard-shortcuts">Editor with keyboard shortcuts</h3><p>Start with the basics in <code>editor.asm</code>.</p>
<div class="highlight"><pre><span></span><span class="c1">; -*- mode: nasm;-*-</span>
<span class="k">bits</span><span class="w"> </span><span class="mi">16</span>
<span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">main</span>
</pre></div>
<p>Then add a clear screen macro.</p>
<div class="highlight"><pre><span></span><span class="cp">%macro cls 0</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
</pre></div>
<p>Add macros for reading and printing.</p>
<div class="highlight"><pre><span></span><span class="cp">%macro read_character 0</span>
<span class="w"> </span><span class="c1">;; Read character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x16</span>
<span class="cp">%endmacro</span>
<span class="cp">%macro print_character 1</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
</pre></div>
<p>Add cursor utilities.</p>
<div class="highlight"><pre><span></span><span class="cp">%macro get_position 0</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
<span class="cp">%macro set_position 0</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x02</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="cp">%endmacro</span>
<span class="nl">goto_end_of_line:</span>
<span class="c1">;; Get current character</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x08</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="c1">;; Iterate until the character is null</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span>
<span class="w"> </span><span class="nf">inc</span><span class="w"> </span><span class="nb">dl</span>
<span class="w"> </span><span class="nf">set_position</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">goto_end_of_line</span>
<span class="nl">.done:</span>
<span class="w"> </span><span class="nf">ret</span>
</pre></div>
<p>And keyboard utilities.</p>
<div class="highlight"><pre><span></span><span class="cp">%macro mov_read_ctrl_flag_into 1</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x0417</span><span class="p">]</span>
<span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0x04</span><span class="w"> </span><span class="c1">; Grab 3rd bit: %1 & 0b0100</span>
<span class="cp">%endmacro</span>
<span class="cp">%macro mov_read_character_into 1</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x041a</span><span class="p">]</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03fe</span><span class="w"> </span><span class="c1">; Offset from 0x0400 - sizeof(uint16) (since head points to next free slot, not last/current slot)</span>
<span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFFFF</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="nb">eax</span><span class="p">]</span>
<span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFF</span>
<span class="cp">%endmacro</span>
</pre></div>
<p>Now we can start the editor loop where we wait for a keypress and
handle it.</p>
<div class="highlight"><pre><span></span><span class="nl">editor_action:</span>
<span class="w"> </span><span class="nf">read_character</span>
</pre></div>
<p>Don't print ASCII garbage if the key pressed is an arrow key. Just do
nothing. (This isn't good editor behavior in general but ours is a
limited one.)</p>
<div class="highlight"><pre><span></span><span class="c1">;; Ignore arrow keys</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x4b</span><span class="w"> </span><span class="c1">; Left</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="c1">; Down</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x4d</span><span class="w"> </span><span class="c1">; Right</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x48</span><span class="w"> </span><span class="c1">; Up</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span>
</pre></div>
<p>Next handle backspace.</p>
<div class="highlight"><pre><span></span><span class="c1">;; Handle backspace</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x08</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.is_backspace</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x7F</span><span class="w"> </span><span class="c1">; For mac keyboards</span>
<span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.done_backspace</span>
<span class="nl">.is_backspace:</span>
<span class="w"> </span><span class="nf">get_position</span>
</pre></div>
<p>If this key is pressed at the first line and the first column, do
nothing.</p>
<div class="highlight"><pre><span></span><span class="c1">;; Handle 0,0 coordinate (do nothing)</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="nb">dh</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="nb">dl</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.overwrite_character</span>
</pre></div>
<p>Otherwise if backspace is pressed not at the beginning of the line,
just overwrite the last character with the ASCII 0 (the code 0 not the
digit 0).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">dl</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.backspace_at_start_of_line</span>
<span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">dl</span><span class="w"> </span><span class="c1">; Decrement column</span>
<span class="w"> </span><span class="nf">set_position</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.overwrite_character</span>
</pre></div>
<p>Otherwise you're at the beginning of the line and you need to jump to
the end of the previous line.</p>
<div class="highlight"><pre><span></span><span class="nl">.backspace_at_start_of_line:</span>
<span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">dh</span><span class="w"> </span><span class="c1">; Decrement row</span>
<span class="w"> </span><span class="nf">set_position</span>
<span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">goto_end_of_line</span>
<span class="nl">.overwrite_character:</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0a</span>
<span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span>
<span class="nl">.done_backspace:</span>
</pre></div>
<p>Next we handle the Enter key. This should move the cursor onto the
next line and set the column back to zero.</p>
<div class="highlight"><pre><span></span><span class="c1">;; Handle enter</span>
<span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0d</span>
<span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.done_enter</span>
<span class="w"> </span><span class="nf">get_position</span>
<span class="w"> </span><span class="nf">inc</span><span class="w"> </span><span class="nb">dh</span><span class="w"> </span><span class="c1">; Increment line</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">dl</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; Reset column</span>
<span class="w"> </span><span class="nf">set_position</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span>
<span class="nl">.done_enter:</span>
</pre></div>
<p>Next we handle ctrl-a, jump to start of line.</p>
<div class="highlight"><pre><span></span><span class="c1">;; Handle ctrl- shortcuts</span>
<span class="c1">;; Check ctrl key</span>
<span class="w"> </span><span class="nf">mov_read_ctrl_flag_into</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.ctrl_not_set</span>
<span class="c1">;; Handle ctrl-a shortcut</span>
<span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; For some reason with ctlr, these are offset from a-z</span>
<span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.not_ctrl_a</span>
<span class="c1">;; Reset column</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">dl</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">set_position</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span>
<span class="nl">.not_ctrl_a:</span>
</pre></div>
<p>For ctrl-e, jump to the end of the line.</p>
<div class="highlight"><pre><span></span><span class="c1">;; Handle ctrl-e shortcut</span>
<span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span>
<span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.not_ctrl_e</span>
<span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">goto_end_of_line</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span>
<span class="nl">.not_ctrl_e:</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span>
<span class="nl">.ctrl_not_set:</span>
</pre></div>
<p>Finally if none of these cases are met, just print the pressed character and return.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span>
<span class="w"> </span><span class="nf">print_character</span><span class="w"> </span><span class="nb">ax</span>
<span class="nl">.done:</span>
<span class="w"> </span><span class="nf">ret</span>
</pre></div>
<p>Finally, create the main function that calls this editor code in a loop.</p>
<div class="highlight"><pre><span></span><span class="nl">main:</span>
<span class="w"> </span><span class="nf">cls</span>
<span class="nl">.loop:</span>
<span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">editor_action</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span>
<span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span>
<span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span>
</pre></div>
<p>And we're done! Try it out:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>editor.asm<span class="w"> </span>-o<span class="w"> </span>editor.bin
$<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>editor.bin
</pre></div>
<p><img src="bootloader-basics-editor.gif" alt="Recording of a bad editor"></p>
<p>Tedious and buggy! But I learned something, I think.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post on my first time exploring bootloader basics! Neat to discover the BIOS APIs and spend some time actually coding in assembly versus just generating or emulating it.<a href="https://t.co/7iP6Nib620">https://t.co/7iP6Nib620</a> <a href="https://t.co/xSyG1IXgEB">pic.twitter.com/xSyG1IXgEB</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1485398216124346371?ref_src=twsrc%5Etfw">January 23, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/bootloader-basics.htmlSun, 23 Jan 2022 00:00:00 +0000
- dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.http://notes.eatonphil.com/dsq.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-01-11-dsq.html" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-01-11-dsq.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/dsq.htmlTue, 11 Jan 2022 00:00:00 +0000
- Analyzing large JSON files via partial JSON parsinghttp://notes.eatonphil.com/analyzing-large-json-files-via-partial-json-parsing.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-01-06-analyzing-large-json-files-via-partial-json-parsing.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2022-01-06-analyzing-large-json-files-via-partial-json-parsing.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/analyzing-large-json-files-via-partial-json-parsing.htmlThu, 06 Jan 2022 00:00:00 +0000
- The year in books: 11 to recommend in 2021http://notes.eatonphil.com/year-in-books-2021.html<p>Last year (2021) I finished 17 books, a five year low. But that's ok!
4 fiction and 13 non-fiction. Another 30 started but not finished.</p>
<h3 id="non-fiction">Non-fiction</h3><p>It seems I was pretty focused on business history books and history of
tech. The 8 non-fiction books I liked the most:</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/34626431-designing-data-intensive-applications">Designing Data-Intensive Applications</a>, a must-read for anyone interacting with a database</li>
<li><a href="https://www.goodreads.com/book/show/24715220-my-years-with-general-motors">My Years with General Motors</a>, the business school classic; truly a good read. But sad to know that shortly after written, GM succumbs to the Japanese and South Korean competition</li>
<li><a href="https://www.goodreads.com/book/show/49195924-no-rules-rules">No Rules Rules: Netflix and the Culture of Reinvention</a></li>
<li><a href="https://www.goodreads.com/book/show/55297149-working-backwards">Working Backwards: Insights, Stories, and Secrets from Inside Amazon</a></li>
<li><a href="https://www.goodreads.com/book/show/54216469-working-in-public">Working in Public: The Making and Maintenance of Open Source Software</a>, my review <a href="https://www.goodreads.com/review/show/3478346828?book_show_action=false&from_review_page=1">here</a></li>
<li><a href="https://www.goodreads.com/book/show/22401445-intel-trinity-the">The Intel Trinity</a>, an early history of Intel</li>
<li><a href="https://www.goodreads.com/book/show/19383579-the-hp-way">The HP Way</a></li>
<li><a href="https://www.goodreads.com/book/show/58208477-play-nice-but-win">Play Nice But Win</a>, the story of Dell computers</li>
<li><a href="https://www.goodreads.com/book/show/36316219-west-with-the-night">West with the Night</a>, beautiful memoir recommended by Ernest Hemingway and written in a similar style. Much more enjoyable than the other more popular colonial-African memoir, Out of Africa.</li>
</ul>
<h4 id="the-rest">The rest</h4><ul>
<li><a href="https://www.goodreads.com/book/show/16059922-pour-your-heart-into-it">Pour Your Heart Into It: How Starbucks Built a Company One Cup at a Time</a></li>
<li><a href="https://www.goodreads.com/book/show/43063719-jump-starting-america">Jump-Starting America: How Breakthrough Science Can Revive Economic Growth and the American Dream</a></li>
<li><a href="https://www.goodreads.com/book/show/9118033-rework">ReWork</a></li>
<li><a href="https://www.goodreads.com/book/show/297901.Russia_and_the_Russians">Russia and the Russians: A History</a></li>
</ul>
<h3 id="fiction">Fiction</h3><p>The 3 fiction books I liked the most:</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/12970829-a-very-british-coup">A Very British Coup</a>, hilarious and depressing. A great companion to the TV show "Yes, Minister"</li>
<li><a href="https://www.goodreads.com/book/show/8862633-mort">Mort</a>, Terry Pratchett is a very funny author</li>
<li><a href="https://www.goodreads.com/book/show/18625885-selected-stories-of-philip-k-dick">Selected Stories of Philip K Dick</a>, depressing and dystopian but very well written. I would not read again because it's too depressing</li>
</ul>
<h4 id="the-rest">The rest</h4><ul>
<li><a href="https://www.goodreads.com/book/show/51135871-there-and-never-ever-back-again">There and NEVER, EVER BACK AGAIN: A Dark Lord's Diary</a>, I was looking for more parodies like Bored of the Rings (which itself wasn't great). This was worse</li>
</ul>
<h3 id="2022">2022</h3><p>This year I'm interested in continuing to find good business books and
good books on the history of tech. I'm also getting into more American
history to make up for all the years of not paying attention in high
school.</p>
<p>I'm continuing to try to find good memoirs and fiction by non-English
authors.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Starting the blog-year off gently with my recap of 2021 in books.<br><br>I spent too much time watching TV and trying new video games to keep up with past years 😅<a href="https://t.co/5mfXbBnihk">https://t.co/5mfXbBnihk</a> <a href="https://t.co/ZHmPsUcr3g">pic.twitter.com/ZHmPsUcr3g</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1478764597033283591?ref_src=twsrc%5Etfw">January 5, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/year-in-books-2021.htmlWed, 05 Jan 2022 00:00:00 +0000
- Writing a minimal Lua implementation with a virtual machine from scratch in Rusthttp://notes.eatonphil.com/lua-in-rust.html<p>By the end of this guide we'll have a minimal, working implementation
of a small part of Lua from scratch. It will be able to run the
following program (among others):</p>
<div class="highlight"><pre><span></span><span class="kr">function</span> <span class="nf">fib</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="kr">if</span> <span class="n">n</span> <span class="o"><</span> <span class="mi">2</span> <span class="kr">then</span>
<span class="kr">return</span> <span class="n">n</span><span class="p">;</span>
<span class="kr">end</span>
<span class="kd">local</span> <span class="n">n1</span> <span class="o">=</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span>
<span class="kd">local</span> <span class="n">n2</span> <span class="o">=</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">);</span>
<span class="kr">return</span> <span class="n">n1</span> <span class="o">+</span> <span class="n">n2</span><span class="p">;</span>
<span class="kr">end</span>
<span class="nb">print</span><span class="p">(</span><span class="n">fib</span><span class="p">(</span><span class="mi">30</span><span class="p">));</span>
</pre></div>
<p>This is my second project in Rust and only the third time I've
invented an instruction set so don't take my style as gospel. However,
I have found some Rust parsing tutorials overly complex so I'm hoping
you'll find this one simpler.</p>
<p>All <a href="https://github.com/eatonphil/lust">source code is available on Github</a>.</p>
<h3 id="entrypoint">Entrypoint</h3><p>Running <code>cargo init</code> will give the boilerplate necessary. In
<code>src/main.rs</code> we'll accept a file name from the command line, perform
lexical analysis to retrieve all tokens from the file, perform grammar
analysis on the tokens to retrieve a tree structure, compile the tree
to a linear set of virtual machine instructions, and finally interpret
the virtual machine instructions.</p>
<div class="highlight"><pre><span></span><span class="k">mod</span> <span class="nn">eval</span><span class="p">;</span>
<span class="k">mod</span> <span class="nn">lex</span><span class="p">;</span>
<span class="k">mod</span> <span class="nn">parse</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">env</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">fs</span><span class="p">;</span>
<span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">args</span>: <span class="nb">Vec</span><span class="o"><</span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">env</span>::<span class="n">args</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">contents</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fs</span>::<span class="n">read_to_string</span><span class="p">(</span><span class="o">&</span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]).</span><span class="n">expect</span><span class="p">(</span><span class="s">"Could not read file"</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">raw</span>: <span class="nb">Vec</span><span class="o"><</span><span class="kt">char</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">contents</span><span class="p">.</span><span class="n">chars</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">lex</span>::<span class="n">lex</span><span class="p">(</span><span class="o">&</span><span class="n">raw</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span>
<span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="fm">panic!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">parse</span>::<span class="n">parse</span><span class="p">(</span><span class="o">&</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">ast</span><span class="p">,</span>
<span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="fm">panic!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">pgrm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eval</span>::<span class="n">compile</span><span class="p">(</span><span class="o">&</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">ast</span><span class="p">);</span>
<span class="w"> </span><span class="n">eval</span>::<span class="n">eval</span><span class="p">(</span><span class="n">pgrm</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Easy peasy. Now let's implement <code>lex</code>.</p>
<h3 id="lexical-analysis">Lexical analysis</h3><p>Lexical analysis drops whitespace (Lua is not whitespace
sensitive) and chunks all source code characters into their
smallest possible meaningful pieces like commas, numbers, identifiers,
keywords, etc.</p>
<p>In order to have useful error messages, we'll keep track of state in
the file with a <code>Location</code> struct that implements <code>increment</code> and
<code>debug</code>.</p>
<p>This goes in <code>src/lex.rs</code>.</p>
<div class="highlight"><pre><span></span><span class="cp">#[derive(Copy, Clone, Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">col</span>: <span class="kt">i32</span><span class="p">,</span>
<span class="w"> </span><span class="n">line</span>: <span class="kt">i32</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>The <code>increment</code> function will update line and column numbers as well
as the current index in the file.</p>
<div class="highlight"><pre><span></span><span class="kd">impl</span><span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">fn</span><span class="w"> </span><span class="nx">increment</span><span class="p">(</span><span class="o">&</span><span class="kp">self</span><span class="p">,</span><span class="w"> </span><span class="nx">newline</span><span class="p">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">newline</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="nx">col</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="nx">line</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="nx">col</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="nx">line</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And the <code>debug</code> function will dump the current line with a pointer in
text to the current column along with a message.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">debug</span><span class="o"><</span><span class="nl">S</span><span class="p">:</span><span class="w"> </span><span class="k">Into</span><span class="o"><</span><span class="n">String</span><span class="o">>></span><span class="p">(</span><span class="o">&</span><span class="n">self</span><span class="p">,</span><span class="w"> </span><span class="nl">raw</span><span class="p">:</span><span class="w"> </span><span class="o">&[</span><span class="n">char</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="nl">msg</span><span class="p">:</span><span class="w"> </span><span class="n">S</span><span class="p">)</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">mut</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">mut</span><span class="w"> </span><span class="n">line_str</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nl">String</span><span class="p">:</span><span class="err">:</span><span class="k">new</span><span class="p">();</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Find</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">whole</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">original</span><span class="w"> </span><span class="n">source</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">raw</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">*</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'\n'</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">discovering</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">question</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="err">!</span><span class="n">line_str</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">line</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="n">line_str</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="err">}</span>
<span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="nf">space</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">" "</span><span class="p">.</span><span class="n">repeat</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">col</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">usize</span><span class="p">);</span>
<span class="w"> </span><span class="nf">format</span><span class="err">!</span><span class="p">(</span><span class="ss">"{}\n\n{}\n{}^ Near here"</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="k">into</span><span class="p">(),</span><span class="w"> </span><span class="n">line_str</span><span class="p">,</span><span class="w"> </span><span class="nf">space</span><span class="p">)</span>
<span class="w"> </span><span class="err">}</span>
<span class="err">}</span>
</pre></div>
<p>The smallest individual unit after lexical analysis is a token which
is either a keyword, number, identifier, operator, or syntax. (This
implementation is clearly skipping lots of real Lua syntax like
strings.)</p>
<div class="highlight"><pre><span></span><span class="cp">#[derive(Debug, PartialEq, Eq, Clone)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">TokenKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Identifier</span><span class="p">,</span>
<span class="w"> </span><span class="n">Syntax</span><span class="p">,</span>
<span class="w"> </span><span class="n">Keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">Number</span><span class="p">,</span>
<span class="w"> </span><span class="n">Operator</span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug, Clone)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">value</span>: <span class="nb">String</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">loc</span>: <span class="nc">Location</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>The top-level <code>lex</code> function will iterate over the file and call a lex
helper for each kind of token, returning an array of all tokens on
success. In between lexing it will "eat whitespace".</p>
<div class="highlight"><pre><span></span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">lex</span><span class="p">(</span><span class="n">s</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">])</span><span class="w"> </span>-> <span class="nb">Result</span><span class="o"><</span><span class="nb">Vec</span><span class="o"><</span><span class="n">Token</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">col</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="n">index</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="n">line</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">tokens</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Token</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">lexers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="n">lex_keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">lex_identifier</span><span class="p">,</span>
<span class="w"> </span><span class="n">lex_number</span><span class="p">,</span>
<span class="w"> </span><span class="n">lex_syntax</span><span class="p">,</span>
<span class="w"> </span><span class="n">lex_operator</span><span class="p">,</span>
<span class="w"> </span><span class="p">];</span>
<span class="w"> </span><span class="o">'</span><span class="na">outer</span>: <span class="nc">while</span><span class="w"> </span><span class="n">loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">loc</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">lexer</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">lexers</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexer</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">loc</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">next_loc</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">;</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">t</span><span class="p">);</span>
<span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nl">'outer</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">loc</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="s">"Unrecognized character while lexing:"</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h4 id="whitespace">Whitespace</h4><p>Eating whitespace is just incrementing the location while we see a
space, tab, newline, etc.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">eat_whitespace</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-> <span class="nc">Location</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">[</span><span class="sc">' '</span><span class="p">,</span><span class="w"> </span><span class="sc">'\n'</span><span class="p">,</span><span class="w"> </span><span class="sc">'\r'</span><span class="p">,</span><span class="w"> </span><span class="sc">'\t'</span><span class="p">].</span><span class="n">contains</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_loc</span>
<span class="p">}</span>
</pre></div>
<h4 id="numbers">Numbers</h4><p>Lexing numbers iterates through the source starting at a position
until it stops seeing decimal digits (this implementation only
supports integers).</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_number</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">ident</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">is_digit</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ident</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
<span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>If there are no digits in the string then this is not a number.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">ident</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span>: <span class="nc">ident</span><span class="p">,</span>
<span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Number</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">next_loc</span><span class="p">,</span>
<span class="w"> </span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">None</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="identifiers">Identifiers</h4><p>Identifiers are any collection of alphabet characters, numbers, and
underscores.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_identifier</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="nb">Vec</span><span class="o"><</span><span class="kt">char</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">ident</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">is_alphanumeric</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'_'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ident</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
<span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>But they cannot start with a number.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// First character must not be a digit</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">ident</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="o">!</span><span class="n">ident</span><span class="p">.</span><span class="n">chars</span><span class="p">().</span><span class="n">next</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">is_digit</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span>: <span class="nc">ident</span><span class="p">,</span>
<span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Identifier</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">next_loc</span><span class="p">,</span>
<span class="w"> </span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">None</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="keywords">Keywords</h4><p>Keywords are alphabetical like identifiers are but they cannot be
reused as variables by the user.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_keyword</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">"function"</span><span class="p">,</span><span class="w"> </span><span class="s">"end"</span><span class="p">,</span><span class="w"> </span><span class="s">"if"</span><span class="p">,</span><span class="w"> </span><span class="s">"then"</span><span class="p">,</span><span class="w"> </span><span class="s">"local"</span><span class="p">,</span><span class="w"> </span><span class="s">"return"</span><span class="p">];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="o">'</span><span class="na">outer</span>: <span class="nc">for</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">is_alphanumeric</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'_'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span>
<span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">possible_syntax</span><span class="p">[</span><span class="o">..</span><span class="n">n</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nl">'outer</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Not a complete match</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">possible_syntax</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// If it got to this point it found a match, so exit early.</span>
<span class="w"> </span><span class="c1">// We don't need a longest match.</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Aside from matching a list of strings we have to make sure
there is a complete match. For example <code>function1</code> is not a keyword,
it's a valid identifier. Whereas <code>function 1</code> is a valid set of tokens
(the keyword <code>function</code> and the number <code>1</code>), even if it's not a valid
Lua grammar.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// If the next character would be part of a valid identifier, then</span>
<span class="w"> </span><span class="c1">// this is not a keyword.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">raw</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">next_c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_c</span><span class="p">.</span><span class="n">is_alphanumeric</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">next_c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'_'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span>: <span class="nc">value</span><span class="p">,</span>
<span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Keyword</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">next_loc</span><span class="p">,</span>
<span class="w"> </span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<h4 id="syntax">Syntax</h4><p>Syntax (in this context) is just language junk that isn't
operators. Things like commas, parenthesis, etc.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_syntax</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">";"</span><span class="p">,</span><span class="w"> </span><span class="s">"="</span><span class="p">,</span><span class="w"> </span><span class="s">"("</span><span class="p">,</span><span class="w"> </span><span class="s">")"</span><span class="p">,</span><span class="w"> </span><span class="s">","</span><span class="p">];</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// TODO: this won't work with multiple-character syntax bits like >= or ==</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span>: <span class="nc">possible_syntax</span><span class="p">.</span><span class="n">to_string</span><span class="p">(),</span>
<span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Syntax</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">next_loc</span><span class="p">,</span>
<span class="w"> </span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">None</span>
<span class="p">}</span>
</pre></div>
<h4 id="operators">Operators</h4><p>Operators are things like plus, minus, and less than
symbols. Operators are syntax but it helps us later on to break these
out into a seperate type of token.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_operator</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">operators</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">"+"</span><span class="p">,</span><span class="w"> </span><span class="s">"-"</span><span class="p">,</span><span class="w"> </span><span class="s">"<"</span><span class="p">];</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">operators</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// TODO: this won't work with multiple-character operators like >= or ==</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">value</span>: <span class="nc">possible_syntax</span><span class="p">.</span><span class="n">to_string</span><span class="p">(),</span>
<span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span>
<span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Operator</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="n">next_loc</span><span class="p">,</span>
<span class="w"> </span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">None</span>
<span class="p">}</span>
</pre></div>
<p>And now we're all done lexing!</p>
<h3 id="grammar-analysis">Grammar analysis</h3><p>Parsing finds grammatical (tree) patterns in a flat list of
tokens. This is called a syntax tree or abstract syntax tree (AST).</p>
<p>The boring part is defining the tree. Generally speaking (and
specifically for this project), the syntax tree is a list of
statements. Statements can be function definitions or expression
statements or if statements or return statements or local
declarations.</p>
<p>This goes in <code>src/parse.rs</code>.</p>
<div class="highlight"><pre><span></span><span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">Statement</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Expression</span><span class="p">(</span><span class="n">Expression</span><span class="p">),</span>
<span class="w"> </span><span class="n">If</span><span class="p">(</span><span class="n">If</span><span class="p">),</span>
<span class="w"> </span><span class="n">FunctionDeclaration</span><span class="p">(</span><span class="n">FunctionDeclaration</span><span class="p">),</span>
<span class="w"> </span><span class="n">Return</span><span class="p">(</span><span class="n">Return</span><span class="p">),</span>
<span class="w"> </span><span class="n">Local</span><span class="p">(</span><span class="n">Local</span><span class="p">),</span>
<span class="p">}</span>
<span class="k">pub</span><span class="w"> </span><span class="k">type</span> <span class="nc">Ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Vec</span><span class="o"><</span><span class="n">Statement</span><span class="o">></span><span class="p">;</span>
</pre></div>
<p>There's almost nothing special at all about the rest of the tree
definitions.</p>
<div class="highlight"><pre><span></span><span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">Literal</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Identifier</span><span class="p">(</span><span class="n">Token</span><span class="p">),</span>
<span class="w"> </span><span class="n">Number</span><span class="p">(</span><span class="n">Token</span><span class="p">),</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">FunctionCall</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nc">Token</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">arguments</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Expression</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">BinaryOperation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">operator</span>: <span class="nc">Token</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">left</span>: <span class="nb">Box</span><span class="o"><</span><span class="n">Expression</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">right</span>: <span class="nb">Box</span><span class="o"><</span><span class="n">Expression</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">Expression</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">FunctionCall</span><span class="p">(</span><span class="n">FunctionCall</span><span class="p">),</span>
<span class="w"> </span><span class="n">BinaryOperation</span><span class="p">(</span><span class="n">BinaryOperation</span><span class="p">),</span>
<span class="w"> </span><span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span><span class="p">),</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">FunctionDeclaration</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nc">Token</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">parameters</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Token</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">body</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Statement</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">If</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">test</span>: <span class="nc">Expression</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">body</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Statement</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Local</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nc">Token</span><span class="p">,</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">expression</span>: <span class="nc">Expression</span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">expression</span>: <span class="nc">Expression</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for the AST!</p>
<h4 id="some-helpers">Some helpers</h4><p>Lastly before the fun part, we'll define a few helpers for validating
each kind of token.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">expect_keyword</span><span class="p">(</span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">value</span>: <span class="kp">&</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-> <span class="kt">bool</span> <span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
<span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Keyword</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">value</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">expect_syntax</span><span class="p">(</span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">value</span>: <span class="kp">&</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-> <span class="kt">bool</span> <span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
<span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Syntax</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">value</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="nf">expect_identifier</span><span class="p">(</span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="kt">bool</span> <span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
<span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Identifier</span>
<span class="p">}</span>
</pre></div>
<p>Now on to the fun part, actually detecting these trees!</p>
<h4 id="top-level-parse">Top-level parse</h4><p>The top-level <code>parse</code> function and it's major helper,
<code>parse_statement</code>, dispatch very similarly to the top-level lex
function. For each statement in the file we look for function
declarations, if statements, return statements, local declarations,
and expression statements.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_statement</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">parsers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="n">parse_if</span><span class="p">,</span>
<span class="w"> </span><span class="n">parse_expression_statement</span><span class="p">,</span>
<span class="w"> </span><span class="n">parse_return</span><span class="p">,</span>
<span class="w"> </span><span class="n">parse_function</span><span class="p">,</span>
<span class="w"> </span><span class="n">parse_local</span><span class="p">,</span>
<span class="w"> </span><span class="p">];</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">parser</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">parsers</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parser</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">is_some</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">res</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">None</span>
<span class="p">}</span>
<span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">parse</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Token</span><span class="o">></span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Result</span><span class="o"><</span><span class="n">Ast</span><span class="p">,</span><span class="w"> </span><span class="nb">String</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">ntokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">();</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">ntokens</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_statement</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">ast</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">stmt</span><span class="p">);</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">loc</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Invalid token while parsing:"</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h4 id="expression-statements">Expression statements</h4><p>Expression statements are just a wrapper for the Rust type
system. They call <code>parse_expression</code> (which we'll define shortly),
expect a semicolon afterward, and wrap the expression in a statement.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_expression_statement</span><span class="p">(</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span>
<span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">)</span><span class="o">?</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">expr</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">;</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">";"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected semicolon after expression:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past semicolon</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">Statement</span>::<span class="n">Expression</span><span class="p">(</span><span class="n">expr</span><span class="p">),</span><span class="w"> </span><span class="n">next_index</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<h4 id="expressions">Expressions</h4><p>Expressions in this minimal Lua are only one of function calls,
literals (numbers, identifiers), or binary operations. To keep things
very simple, binary operations cannot be combined. So instead of <code>1 +
2 + 3</code> we'd need to do <code>local tmp1 = 1 + 2; local tmp2 = tmp1 + 3;</code>
and so on.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Expression</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Number</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Number</span><span class="p">(</span><span class="n">t</span><span class="p">)),</span>
<span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Identifier</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Identifier</span><span class="p">(</span><span class="n">t</span><span class="p">)),</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>If what follows the first literal is an open parenthesis then we try
to parse a function call.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">"("</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past open paren</span>
<span class="w"> </span><span class="c1">// Function call</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">arguments</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Expression</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span>
</pre></div>
<p>We need to call <code>parse_expression</code> recursively for every possible
argument passed to the function.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">")"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">arguments</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">","</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma between function call arguments:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past comma</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">arguments</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">arg</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid expression in function call arguments:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past closing paren</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Expression</span>::<span class="n">FunctionCall</span><span class="p">(</span><span class="n">FunctionCall</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span>: <span class="nc">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">(),</span>
<span class="w"> </span><span class="n">arguments</span><span class="p">,</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="n">next_index</span><span class="p">,</span>
<span class="w"> </span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Otherwise if there isn't an opening parenthesis then we could be
parsing either a literal expression or a binary operation. If the
token that follows is an operator token then we know it's a binary
operation.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Might be a literal expression</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">().</span><span class="n">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Operator</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">left</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Otherwise is a binary operation</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past op</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid right hand side binary operand:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">rtoken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
</pre></div>
<p>It is at this point that we <em>could</em> (but won't) call
<code>parse_expression</code> recursively. I don't want to deal with operator
precedence right now so we'll just require that the right hand side is
another literal.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">rtoken</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Number</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Number</span><span class="p">(</span><span class="n">rtoken</span><span class="p">)),</span>
<span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Identifier</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Identifier</span><span class="p">(</span><span class="n">rtoken</span><span class="p">)),</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">rtoken</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid right hand side binary operand:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past right hand operand</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Expression</span>::<span class="n">BinaryOperation</span><span class="p">(</span><span class="n">BinaryOperation</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">left</span>: <span class="nb">Box</span>::<span class="n">new</span><span class="p">(</span><span class="n">left</span><span class="p">),</span>
<span class="w"> </span><span class="n">right</span>: <span class="nb">Box</span>::<span class="n">new</span><span class="p">(</span><span class="n">right</span><span class="p">),</span>
<span class="w"> </span><span class="n">operator</span>: <span class="nc">op</span><span class="p">,</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="n">next_index</span><span class="p">,</span>
<span class="w"> </span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p>And now we're done parsing expressions!</p>
<h4 id="function-declarations">Function declarations</h4><p>Functions start with the <code>function</code> keyword, and an identifier token follows.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_function</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">"function"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_identifier</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid identifier for function name:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
</pre></div>
<p>After the function name comes the argument list that can be empty or a
comma separated list of identifiers.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past name</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">"("</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected open parenthesis in function declaration:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past open paren</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">parameters</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Token</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">")"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">parameters</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">","</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">loc</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma or close parenthesis after parameter in function declaration:"</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past comma</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">parameters</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">());</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past param</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past close paren</span>
</pre></div>
<p>Next we parse all statements in the function body until we find the
<code>end</code> keyword.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">statements</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Statement</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">"end"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_statement</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">statements</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">stmt</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid statement in function declaration:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past end</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">FunctionDeclaration</span><span class="p">(</span><span class="n">FunctionDeclaration</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span><span class="p">,</span>
<span class="w"> </span><span class="n">parameters</span><span class="p">,</span>
<span class="w"> </span><span class="n">body</span>: <span class="nc">statements</span><span class="p">,</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="n">next_index</span><span class="p">,</span>
<span class="w"> </span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p>Phew! We're halfway through the parser.</p>
<h4 id="return-statements">Return statements</h4><p>Return statements just check for the <code>return</code> keyword, an expression,
and a semicolon.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_return</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">"return"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past return</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">is_none</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid expression in return statement:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">expr</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">";"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected semicolon in return statement:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past semicolon</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">Statement</span>::<span class="n">Return</span><span class="p">(</span><span class="n">Return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">expression</span>: <span class="nc">expr</span><span class="w"> </span><span class="p">}),</span><span class="w"> </span><span class="n">next_index</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<h4 id="local-declarations">Local declarations</h4><p>Local declarations start with the <code>local</code> keyword, then the local
name, then an equal sign, then an expression, and then a semicolon.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_local</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-> <span class="nb">Option</span><span class="o"><</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">"local"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past local</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_identifier</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid identifier for local name:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past name</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">"="</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected = syntax after local name:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past =</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">is_none</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected valid expression in local declaration:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">expr</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">";"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected semicolon in return statement:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past semicolon</span>
<span class="w"> </span><span class="nb">Some</span><span class="p">((</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">Local</span><span class="p">(</span><span class="n">Local</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">name</span><span class="p">,</span>
<span class="w"> </span><span class="n">expression</span>: <span class="nc">expr</span><span class="p">,</span>
<span class="w"> </span><span class="p">}),</span>
<span class="w"> </span><span class="n">next_index</span><span class="p">,</span>
<span class="w"> </span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<h4 id="if-statements">If statements</h4><p>This implementation of Lua doesn't support <code>elseif</code> so parsing <code>if</code>
just checks for the <code>if</code> keyword followed by a test expression, then
the <code>else</code> keyword, then the if body (a list of statements), and then the
<code>end</code> keyword.</p>
<div class="highlight"><pre><span></span><span class="sx">fn</span><span class="w"> </span><span class="nl">parse_if(raw</span><span class="p">:</span><span class="w"> </span><span class="sx">&[char],</span><span class="w"> </span><span class="nl">tokens</span><span class="p">:</span><span class="w"> </span><span class="sx">&[Token],</span><span class="w"> </span><span class="nl">index</span><span class="p">:</span><span class="w"> </span><span class="sx">usize)</span><span class="w"> </span><span class="sx">-></span><span class="w"> </span><span class="sx">Option<(Statement,</span><span class="w"> </span><span class="sx">usize)></span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">!expect_keyword(tokens,</span><span class="w"> </span><span class="sx">index,</span><span class="w"> </span><span class="s2">"if"</span><span class="sx">)</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span>
<span class="w"> </span><span class="sx">}</span>
<span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">mut</span><span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">index</span><span class="w"> </span><span class="sx">+</span><span class="w"> </span><span class="sx">1;</span><span class="w"> </span><span class="sx">//</span><span class="w"> </span><span class="sx">Skip</span><span class="w"> </span><span class="sx">past</span><span class="w"> </span><span class="sx">if</span>
<span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">parse_expression(raw,</span><span class="w"> </span><span class="sx">tokens,</span><span class="w"> </span><span class="sx">next_index);</span>
<span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">res.is_none()</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="sx">println!(</span>
<span class="w"> </span><span class="s2">"{}"</span><span class="sx">,</span>
<span class="w"> </span><span class="sx">tokens[next_index]</span>
<span class="w"> </span><span class="sx">.loc</span>
<span class="w"> </span><span class="sx">.debug(raw,</span><span class="w"> </span><span class="s2">"Expected valid expression for if test:"</span><span class="sx">)</span>
<span class="w"> </span><span class="sx">);</span>
<span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span>
<span class="w"> </span><span class="sx">}</span>
<span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">(test,</span><span class="w"> </span><span class="sx">next_next_index)</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">res.unwrap();</span>
<span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">next_next_index;</span>
<span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">!expect_keyword(tokens,</span><span class="w"> </span><span class="sx">next_index,</span><span class="w"> </span><span class="s2">"then"</span><span class="sx">)</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span>
<span class="w"> </span><span class="sx">}</span>
<span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="sx">+</span><span class="p">=</span><span class="w"> </span><span class="sx">1;</span><span class="w"> </span><span class="sx">//</span><span class="w"> </span><span class="sx">Skip</span><span class="w"> </span><span class="sx">past</span><span class="w"> </span><span class="sx">then</span>
<span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">mut</span><span class="w"> </span><span class="nl">statements</span><span class="p">:</span><span class="w"> </span><span class="sx">Vec<Statement></span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">vec![];</span>
<span class="w"> </span><span class="sx">while</span><span class="w"> </span><span class="sx">!expect_keyword(tokens,</span><span class="w"> </span><span class="sx">next_index,</span><span class="w"> </span><span class="s2">"end"</span><span class="sx">)</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">parse_statement(raw,</span><span class="w"> </span><span class="sx">tokens,</span><span class="w"> </span><span class="sx">next_index);</span>
<span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">Some((stmt,</span><span class="w"> </span><span class="sx">next_next_index))</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">res</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">next_next_index;</span>
<span class="w"> </span><span class="sx">statements.push(stmt);</span>
<span class="w"> </span><span class="sx">}</span><span class="w"> </span><span class="sx">else</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="sx">println!(</span>
<span class="w"> </span><span class="s2">"{}"</span><span class="sx">,</span>
<span class="w"> </span><span class="sx">tokens[next_index]</span>
<span class="w"> </span><span class="sx">.loc</span>
<span class="w"> </span><span class="sx">.debug(raw,</span><span class="w"> </span><span class="s2">"Expected valid statement in if body:"</span><span class="sx">)</span>
<span class="w"> </span><span class="sx">);</span>
<span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span>
<span class="w"> </span><span class="sx">}</span>
<span class="w"> </span><span class="sx">}</span>
<span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="sx">+</span><span class="p">=</span><span class="w"> </span><span class="sx">1;</span><span class="w"> </span><span class="sx">//</span><span class="w"> </span><span class="sx">Skip</span><span class="w"> </span><span class="sx">past</span><span class="w"> </span><span class="sx">end</span>
<span class="w"> </span><span class="sx">Some((</span>
<span class="w"> </span><span class="nl">Statement</span><span class="p">::</span><span class="nl">If(If</span><span class="w"> </span><span class="sx">{</span>
<span class="w"> </span><span class="sx">test,</span>
<span class="w"> </span><span class="nl">body</span><span class="p">:</span><span class="w"> </span><span class="sx">statements,</span>
<span class="w"> </span><span class="sx">}),</span>
<span class="w"> </span><span class="sx">next_index,</span>
<span class="w"> </span><span class="sx">))</span>
<span class="sx">}</span>
</pre></div>
<p>And goshdarnit we're done parsing.</p>
<h3 id="compiling-to-a-made-up-virtual-machine">Compiling to a made up virtual machine</h3><p>This virtual machine will be entirely stack-based other than the stack
pointer and program counter.</p>
<p>The calling convention is that the caller will put arguments on the
stack followed by the frame pointer, the program counter, and then the
number of arguments (for cleanup). Then it will alter the program
counter and frame pointer. Then the caller will allocate space on the
stack for all arguments and locals within the function.</p>
<p>For simplicity in addressing modes, the function declaration once
jumped to will copy the arguments from before the frame pointer to in
front of it (yes I know, I know, this is silly).</p>
<p>The virtual machine will support add, subtract, less than operations
as well as jump, jump-if-not-zero, return, and call. It will support a
few more memory-specific instructions for loading literals, loading
identifiers, and managing arguments.</p>
<p>I'll explain the non-obvious instructions as we implement them.</p>
<div class="highlight"><pre><span></span><span class="k">use</span><span class="w"> </span><span class="k">crate</span>::<span class="n">parse</span>::<span class="o">*</span><span class="p">;</span>
<span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">collections</span>::<span class="n">HashMap</span><span class="p">;</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">enum</span> <span class="nc">Instruction</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">DupPlusFP</span><span class="p">(</span><span class="kt">i32</span><span class="p">),</span>
<span class="w"> </span><span class="n">MoveMinusFP</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="p">),</span>
<span class="w"> </span><span class="n">MovePlusFP</span><span class="p">(</span><span class="kt">usize</span><span class="p">),</span>
<span class="w"> </span><span class="n">Store</span><span class="p">(</span><span class="kt">i32</span><span class="p">),</span>
<span class="w"> </span><span class="n">Return</span><span class="p">,</span>
<span class="w"> </span><span class="n">JumpIfNotZero</span><span class="p">(</span><span class="nb">String</span><span class="p">),</span>
<span class="w"> </span><span class="n">Jump</span><span class="p">(</span><span class="nb">String</span><span class="p">),</span>
<span class="w"> </span><span class="n">Call</span><span class="p">(</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">),</span>
<span class="w"> </span><span class="n">Add</span><span class="p">,</span>
<span class="w"> </span><span class="n">Subtract</span><span class="p">,</span>
<span class="w"> </span><span class="n">LessThan</span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>The result of compiling will be a <code>Program</code> instance. This instance
will contain symbol information and the actual instructions to run.</p>
<div class="highlight"><pre><span></span><span class="cp">#[derive(Debug)]</span>
<span class="k">struct</span> <span class="nc">Symbol</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">location</span>: <span class="kt">i32</span><span class="p">,</span>
<span class="w"> </span><span class="n">narguments</span>: <span class="kt">usize</span><span class="p">,</span>
<span class="w"> </span><span class="n">nlocals</span>: <span class="kt">usize</span><span class="p">,</span>
<span class="p">}</span>
<span class="cp">#[derive(Debug)]</span>
<span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Program</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">syms</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="n">Symbol</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">instructions</span>: <span class="nb">Vec</span><span class="o"><</span><span class="n">Instruction</span><span class="o">></span><span class="p">,</span>
<span class="p">}</span>
</pre></div>
<p>Compiling, similar to parsing, just calls the helper
<code>compile_statement</code> for each statement in the AST.</p>
<div class="highlight"><pre><span></span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">compile</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">ast</span>: <span class="nc">Ast</span><span class="p">)</span><span class="w"> </span>-> <span class="nc">Program</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">locals</span>: <span class="nc">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">pgrm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Program</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">syms</span>: <span class="nc">HashMap</span>::<span class="n">new</span><span class="p">(),</span>
<span class="w"> </span><span class="n">instructions</span>: <span class="nb">Vec</span>::<span class="n">new</span><span class="p">(),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_statement</span><span class="p">(</span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">pgrm</span>
<span class="p">}</span>
</pre></div>
<p>And <code>compile_statement</code> dispatches to additional helpers based on the
kind of statement.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_statement</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="nb">Vec</span><span class="o"><</span><span class="kt">char</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">stmt</span>: <span class="nc">Statement</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">FunctionDeclaration</span><span class="p">(</span><span class="n">fd</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">compile_declaration</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">fd</span><span class="p">),</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">Return</span><span class="p">(</span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">compile_return</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">),</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">If</span><span class="p">(</span><span class="n">if_</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">compile_if</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">if_</span><span class="p">),</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">Local</span><span class="p">(</span><span class="n">loc</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">compile_local</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">loc</span><span class="p">),</span>
<span class="w"> </span><span class="n">Statement</span>::<span class="n">Expression</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">e</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="function-declarations">Function declarations</h4><p>Let's do the hard one first. First off, function declarations will
include an unconditional guard around them so that we can evaluate
from the 0th instruction at the top-level and have only
non-function-declaration statements be evaluated.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_declaration</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">_</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">fd</span>: <span class="nc">FunctionDeclaration</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Jump to end of function to guard top-level</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">done_label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">format!</span><span class="p">(</span><span class="s">"function_done_{}"</span><span class="p">,</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">());</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span>
<span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Jump</span><span class="p">(</span><span class="n">done_label</span><span class="p">.</span><span class="n">clone</span><span class="p">()));</span>
</pre></div>
<p>Then we'll add another limitation/simplification that local variables
are only accessible within the current function scope.</p>
<p>For each parameter, we'll copy the parameter on the stack before the
frame pointer to a place in front of the frame pointer. This gets
around addressing mode limitations in our virtual machine.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">new_locals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span>::<span class="n">new</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">function_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">parameters</span><span class="p">.</span><span class="n">len</span><span class="p">();</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">param</span><span class="p">)</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">parameters</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">enumerate</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">MoveMinusFP</span><span class="p">(</span>
<span class="w"> </span><span class="n">i</span><span class="p">,</span>
<span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span>
<span class="w"> </span><span class="p">));</span>
<span class="w"> </span><span class="n">new_locals</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">param</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">clone</span><span class="p">(),</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then we compile the body.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">body</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_statement</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="k">mut</span><span class="w"> </span><span class="n">new_locals</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Once the body is compiled we know the total number of locals so we can
fill out the symbol table correctly. The location is importantly
already stored because it is the location of the instruction where the
function started.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span>
<span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">value</span><span class="p">,</span>
<span class="w"> </span><span class="n">Symbol</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">location</span>: <span class="nc">function_index</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span>
<span class="w"> </span><span class="n">narguments</span><span class="p">,</span>
<span class="w"> </span><span class="n">nlocals</span>: <span class="nc">new_locals</span><span class="p">.</span><span class="n">keys</span><span class="p">().</span><span class="n">len</span><span class="p">(),</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">);</span>
</pre></div>
<p>Finally we add a symbol linking the done label for the function to
the position of the end of the function. Again, this allows us to skip
past the function declaration when evaluating instructions from 0 to
N.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span>
<span class="w"> </span><span class="n">done_label</span><span class="p">,</span>
<span class="w"> </span><span class="n">Symbol</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">location</span>: <span class="nc">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span>
<span class="w"> </span><span class="n">narguments</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="n">nlocals</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Ok that wasn't so bad. And the rest are simpler.</p>
<h4 id="local-declarations">Local declarations</h4><p>The expression for the local is compiled and then the local name is
stored in a locals table mapped to the current number of locals
(including arguments). This allows the compiler to turn <code>identifier</code>
token lookups into simply an offset from the frame pointer.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_local</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">local</span>: <span class="nc">Local</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">locals</span><span class="p">.</span><span class="n">keys</span><span class="p">().</span><span class="n">len</span><span class="p">();</span>
<span class="w"> </span><span class="n">locals</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">local</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">);</span>
<span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">local</span><span class="p">.</span><span class="n">expression</span><span class="p">);</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">MovePlusFP</span><span class="p">(</span><span class="n">index</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<p>And specifically, the instruction pattern is to evaluate the
expression and then copy it back into a relative position in the
stack.</p>
<h4 id="literals">Literals</h4><p>Number literals use the <code>store</code> instruction for pushing a number onto
the stack. Identifier literals are copied to the top of the stack from
their position relative to the frame pointer.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_literal</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">_</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">lit</span>: <span class="nc">Literal</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">lit</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Literal</span>::<span class="n">Number</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">parse</span>::<span class="o"><</span><span class="kt">i32</span><span class="o">></span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Store</span><span class="p">(</span><span class="n">n</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Literal</span>::<span class="n">Identifier</span><span class="p">(</span><span class="n">ident</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span>
<span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">DupPlusFP</span><span class="p">(</span><span class="n">locals</span><span class="p">[</span><span class="o">&</span><span class="n">ident</span><span class="p">.</span><span class="n">value</span><span class="p">]));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="function-calls">Function calls</h4><p>Pretty simple: just compile all the arguments and then issue a call
instruction.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_function_call</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="nb">Vec</span><span class="o"><</span><span class="kt">char</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">fc</span>: <span class="nc">FunctionCall</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fc</span><span class="p">.</span><span class="n">arguments</span><span class="p">.</span><span class="n">len</span><span class="p">();</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">arg</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">fc</span><span class="p">.</span><span class="n">arguments</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span>
<span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Call</span><span class="p">(</span><span class="n">fc</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<h4 id="binary-operations">Binary operations</h4><p>Binary operations compile the left, then the right, and then issue an
instruction based on the operator. All the operators are builtin and
act on the top two elements on the stack.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_binary_operation</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">bop</span>: <span class="nc">BinaryOperation</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">bop</span><span class="p">.</span><span class="n">left</span><span class="p">);</span>
<span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">bop</span><span class="p">.</span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">bop</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">as_str</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s">"+"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Add</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="s">"-"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Subtract</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="s">"<"</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">LessThan</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="fm">panic!</span><span class="p">(</span>
<span class="w"> </span><span class="s">"{}"</span><span class="p">,</span>
<span class="w"> </span><span class="n">bop</span><span class="p">.</span><span class="n">operator</span>
<span class="w"> </span><span class="p">.</span><span class="n">loc</span>
<span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">"Unable to compile binary operation:"</span><span class="p">)</span>
<span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="expressions">Expressions</h4><p>Compiling expressions just dispatches to a compile helper based on the
type of expression. We've already written those three helpers.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_expression</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">exp</span>: <span class="nc">Expression</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Expression</span>::<span class="n">BinaryOperation</span><span class="p">(</span><span class="n">bop</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_binary_operation</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">bop</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Expression</span>::<span class="n">FunctionCall</span><span class="p">(</span><span class="n">fc</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_function_call</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">fc</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">lit</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_literal</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">lit</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="if">If</h4><p>First we compile the conditional test and then we jump to after the if
the test result is not zero.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_if</span><span class="p">(</span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">if_</span>: <span class="nc">If</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">if_</span><span class="p">.</span><span class="n">test</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">done_label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">format!</span><span class="p">(</span><span class="s">"if_else_{}"</span><span class="p">,</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">());</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span>
<span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">JumpIfNotZero</span><span class="p">(</span><span class="n">done_label</span><span class="p">.</span><span class="n">clone</span><span class="p">()));</span>
</pre></div>
<p>Then we compile the body.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">if_</span><span class="p">.</span><span class="n">body</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_statement</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And finally make sure we insert the <code>done</code> symbol in the right place after the if.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span>
<span class="w"> </span><span class="n">done_label</span><span class="p">,</span>
<span class="w"> </span><span class="n">Symbol</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">location</span>: <span class="nc">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span>
<span class="w"> </span><span class="n">nlocals</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="n">narguments</span>: <span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h4 id="return">Return</h4><p>The final statement type is return. We simply compile the return
expression and issue a return instruction.</p>
<div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_return</span><span class="p">(</span>
<span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span>
<span class="w"> </span><span class="n">raw</span>: <span class="kp">&</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span>
<span class="w"> </span><span class="n">locals</span>: <span class="kp">&</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">></span><span class="p">,</span>
<span class="w"> </span><span class="n">ret</span>: <span class="nc">Return</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">ret</span><span class="p">.</span><span class="n">expression</span><span class="p">);</span>
<span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Return</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>That's it for the compiler! Now the trickiest part. I lost a few hours
debugging and iterating on the next bit.</p>
<h3 id="the-virtual-machine">The virtual machine</h3><p>Ok so the easy part is that there are only two registers, a program
counter and a frame pointer. There's also a data stack. The frame
pointer points to the location on the data stack where each function
can start storing its locals.</p>
<p>Evaluation starts from 0 and goes until the last instruction.</p>
<div class="highlight"><pre><span></span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">eval</span><span class="p">(</span><span class="n">pgrm</span>: <span class="nc">Program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">pc</span>: <span class="kt">i32</span> <span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">fp</span>: <span class="kt">i32</span> <span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">data</span>: <span class="nb">Vec</span><span class="o"><</span><span class="kt">i32</span><span class="o">></span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="o">&</span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">pc</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
</pre></div>
<p>Each instruction will be responsible for incrementing the program
counter or having it jump around.</p>
<h4 id="addition,-subtraction,-less-than">Addition, subtraction, less than</h4><p>The easiest ones are the math operators. We just pop off the data
stack, perform the operation, and store the result.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Add</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">left</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Instruction</span>::<span class="n">Subtract</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">left</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Instruction</span>::<span class="n">LessThan</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>The <code>store</code> instruction is another easy one. It just pushes a literal
number onto the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Store</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="o">*</span><span class="n">n</span><span class="p">);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="jump-variants">Jump variants</h4><p>The jump variants are easy too. Just grab the location and change the
program counter. If it's a conditional jump then test the condition
first.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">JumpIfNotZero</span><span class="p">(</span><span class="n">label</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">location</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Instruction</span>::<span class="n">Jump</span><span class="p">(</span><span class="n">label</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">location</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="loading-from-a-variable">Loading from a variable</h4><p>The <code>MovePlusFP</code> instruction copies a value from the stack (offset the
frame pointer) onto the top of the stack. This is for references to
arguments and locals.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">MovePlusFP</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="o">*</span><span class="n">i</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Accounts for top-level locals</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">index</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="storing-locals">Storing locals</h4><p>The <code>DupPlusFP</code> instruction is used by <code>compile_locals</code> to store a
local once compiled onto the stack in the relative position from the
frame pointer.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">DupPlusFP</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">data</span><span class="p">[(</span><span class="n">fp</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="p">]);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="duplicating-arguments">Duplicating arguments</h4><p>The <code>MoveMinusFP</code> instruction is, again, a hack to work around limited
addressing modes in this minimal virtual machine. It copies
arguments from behind the frame pointer to in front of the frame
pointer.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">MoveMinusFP</span><span class="p">(</span><span class="n">local_offset</span><span class="p">,</span><span class="w"> </span><span class="n">fp_offset</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">fp</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">local_offset</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[(</span><span class="n">fp</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">fp_offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="p">];</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now we're down to the last two instructions: call and return.</p>
<h4 id="call">Call</h4><p>Call has a special dispatch for builtin functions (the only one that
exists being <code>print</code>).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Call</span><span class="p">(</span><span class="n">label</span><span class="p">,</span><span class="w"> </span><span class="n">narguments</span><span class="p">)</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Handle builtin functions</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"print"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="mi">0</span><span class="o">..*</span><span class="n">narguments</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="fm">print!</span><span class="p">(</span><span class="s">"{}"</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">());</span>
<span class="w"> </span><span class="fm">print!</span><span class="p">(</span><span class="s">" "</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="fm">println!</span><span class="p">();</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Otherwise it pushes the current frame pointer, then the program
counter, and finally the number of arguments (not locals) onto the
stack for preservation. Then it sets up the new program counter and
frame pointer and creates space for all locals and arguments after the
new frame pointer.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">narguments</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">location</span><span class="p">;</span>
<span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Set up space for all arguments/locals</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">nlocals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">nlocals</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">nlocals</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">nlocals</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="return">Return</h4><p>The return instructions pops the return value from the stack. Then it
pops off all locals and arguments. Then it restores the program
counter and frame pointer, and pops off the arguments before the frame
pointer. Finally it adds the return value back onto the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Return</span><span class="w"> </span><span class="o">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">ret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Clean up the local stack</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Restore pc and fp</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Clean up arguments</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
<span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Add back return value</span>
<span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And yes, this implementation would be more efficient if instead of
literally pushing and popping we just incremented/decremented a
stack pointer.</p>
<p>And that's it! We're completely done a basic parser, compiler, and
virtual machine for a subet of Lua. Is it janky? Yeah. Is it simple?
Kind of? Does it work? It seems to!</p>
<h3 id="summary">Summary</h3><p>Ok we've got <1200 lines of Rust enough to run some decent Lua
programs. We run this fib program against this implementation and
against Lua 5.4.3 (which isn't LuaJIT) and what do we see?</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cargo<span class="w"> </span>build<span class="w"> </span>--release
$<span class="w"> </span>cat<span class="w"> </span>test/fib.lua
<span class="k">function</span><span class="w"> </span>fib<span class="o">(</span>n<span class="o">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span><<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="k">then</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>n<span class="p">;</span>
<span class="w"> </span>end
<span class="w"> </span><span class="nb">local</span><span class="w"> </span><span class="nv">n1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>fib<span class="o">(</span>n-1<span class="o">)</span><span class="p">;</span>
<span class="w"> </span><span class="nb">local</span><span class="w"> </span><span class="nv">n2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>fib<span class="o">(</span>n-2<span class="o">)</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>n1<span class="w"> </span>+<span class="w"> </span>n2<span class="p">;</span>
end
print<span class="o">(</span>fib<span class="o">(</span><span class="m">30</span><span class="o">))</span><span class="p">;</span>
$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>./target/release/lust<span class="w"> </span>test/fib.lua
<span class="m">832040</span>
./target/release/lust<span class="w"> </span>test/fib.lua<span class="w"> </span><span class="m">0</span>.29s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.293<span class="w"> </span>total
$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>lua<span class="w"> </span>test/fib.lua
<span class="m">832040</span>
lua<span class="w"> </span>test/fib.lua<span class="w"> </span><span class="m">0</span>.06s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.063<span class="w"> </span>total
</pre></div>
<p>This implementation is a bit slower! Time to do some profiling and maybe
revisit some of those aforementioned inefficiencies.</p>
<p class="note">
Big thanks to <a
href="https://twitter.com/christianfscott/status/1475832498663792640">Christian
Scott on Twitter</a> for pointing out I should not be benchmarking
with debug builds!
<br /><br />
And thanks
to <a href="https://www.reddit.com/r/rust/comments/rqgm8t/comment/hqbwgwj/">reddit123123123123
on Reddit</a> for suggesting I use <code>cargo clippy</code> to
clean up my code.
<br /><br />
Thanks to <a href="https://github.com/eatonphil/lust/issues/1">GiffE
on Github</a> for pointing out some key inconsistencies between this
implementation and Lua. I won't modify anything because a perfect
Lua subset wasn't the goal, but I'm sharing because it was good
analysis and criticism of this implementation.
</p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post on parsing, compiling, and virtual machine evaluation for a super minimal Lua implementation written from scratch in Rust!<a href="https://t.co/8qFviEecJo">https://t.co/8qFviEecJo</a> <a href="https://t.co/d1MGArlErR">pic.twitter.com/d1MGArlErR</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1475828516835008513?ref_src=twsrc%5Etfw">December 28, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/lua-in-rust.htmlTue, 28 Dec 2021 00:00:00 +0000
- Running SQL Server in a container on Github Actionshttp://notes.eatonphil.com/sqlserver-in-github-actions.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-12-16-sqlserver-in-github-actions.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-12-16-sqlserver-in-github-actions.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/sqlserver-in-github-actions.htmlThu, 16 Dec 2021 00:00:00 +0000
- Implementing zip archiving in Golang: unzippinghttp://notes.eatonphil.com/implementing-zip-in-go-unzipping.html<p><small>All code for this post is <a href="https://github.com/eatonphil/gozip">available on Github</a>.</small></p>
<p>Let's take a look at how zip files work. Take a small file for example:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>hello.text
Hello!
</pre></div>
<p>Let's zip it up.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>zip<span class="w"> </span>test.zip<span class="w"> </span>hello.text
adding:<span class="w"> </span>hello.text<span class="w"> </span><span class="o">(</span>stored<span class="w"> </span><span class="m">0</span>%<span class="o">)</span>
$<span class="w"> </span>ls<span class="w"> </span>-lah<span class="w"> </span>test.zip
-rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>phil<span class="w"> </span>phil<span class="w"> </span><span class="m">177</span><span class="w"> </span>Nov<span class="w"> </span><span class="m">23</span><span class="w"> </span><span class="m">23</span>:04<span class="w"> </span>test.zip
</pre></div>
<p>So a 6 byte text file becomes a 177 byte zip file. That is pretty
small! Parsing 177 bytes sounds like it can't possibly be too
complicated!</p>
<p>Let's hexdump the zip file.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>hexdump<span class="w"> </span>-C<span class="w"> </span>test.zip
<span class="m">00000000</span><span class="w"> </span><span class="m">50</span><span class="w"> </span>4b<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>8a<span class="w"> </span>b8<span class="w"> </span><span class="m">77</span><span class="w"> </span><span class="m">53</span><span class="w"> </span>9e<span class="w"> </span>d8<span class="w"> </span><span class="p">|</span>PK..........wS..<span class="p">|</span>
<span class="m">00000010</span><span class="w"> </span><span class="m">42</span><span class="w"> </span>b0<span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span>1c<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">68</span><span class="w"> </span><span class="m">65</span><span class="w"> </span><span class="p">|</span>B.............he<span class="p">|</span>
<span class="m">00000020</span><span class="w"> </span>6c<span class="w"> </span>6c<span class="w"> </span>6f<span class="w"> </span>2e<span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">65</span><span class="w"> </span><span class="m">78</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">55</span><span class="w"> </span><span class="m">54</span><span class="w"> </span><span class="m">09</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">73</span><span class="w"> </span>9d<span class="w"> </span><span class="p">|</span>llo.textUT...ts.<span class="p">|</span>
<span class="m">00000030</span><span class="w"> </span><span class="m">61</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">73</span><span class="w"> </span>9d<span class="w"> </span><span class="m">61</span><span class="w"> </span><span class="m">75</span><span class="w"> </span><span class="m">78</span><span class="w"> </span>0b<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">04</span><span class="w"> </span><span class="p">|</span>ats.aux.........<span class="p">|</span>
<span class="m">00000040</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">65</span><span class="w"> </span>6c<span class="w"> </span>6c<span class="w"> </span>6f<span class="w"> </span><span class="m">21</span><span class="w"> </span>0a<span class="w"> </span><span class="m">50</span><span class="w"> </span>4b<span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">02</span><span class="w"> </span>1e<span class="w"> </span><span class="p">|</span>....Hello!.PK...<span class="p">|</span>
<span class="m">00000050</span><span class="w"> </span><span class="m">03</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>8a<span class="w"> </span>b8<span class="w"> </span><span class="m">77</span><span class="w"> </span><span class="m">53</span><span class="w"> </span>9e<span class="w"> </span>d8<span class="w"> </span><span class="m">42</span><span class="w"> </span>b0<span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="p">|</span>.........wS..B..<span class="p">|</span>
<span class="m">00000060</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">18</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="p">|</span>................<span class="p">|</span>
<span class="m">00000070</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>a4<span class="w"> </span><span class="m">81</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">68</span><span class="w"> </span><span class="m">65</span><span class="w"> </span>6c<span class="w"> </span>6c<span class="w"> </span>6f<span class="w"> </span>2e<span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="p">|</span>.........hello.t<span class="p">|</span>
<span class="m">00000080</span><span class="w"> </span><span class="m">65</span><span class="w"> </span><span class="m">78</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">55</span><span class="w"> </span><span class="m">54</span><span class="w"> </span><span class="m">05</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">73</span><span class="w"> </span>9d<span class="w"> </span><span class="m">61</span><span class="w"> </span><span class="m">75</span><span class="w"> </span><span class="m">78</span><span class="w"> </span>0b<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>extUT...ts.aux..<span class="p">|</span>
<span class="m">00000090</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">50</span><span class="w"> </span>4b<span class="w"> </span><span class="m">05</span><span class="w"> </span><span class="m">06</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>...........PK...<span class="p">|</span>
000000a0<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">50</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>4b<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>.......P...K....<span class="p">|</span>
000000b0<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>.<span class="p">|</span>
000000b1
</pre></div>
<p>We can see both the file name and the file contents in there.</p>
<h3 id="structure">Structure</h3><p>Let's take a look at the zip structure defined
<a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">here</a>. Based
on section 4.3.6 it looks like file metadata followed by the file
contents are stored one after another with a final chunk of "central
directory" metadata.</p>
<div style="text-align:center">
<img src="https://www.codeproject.com/KB/cs/remotezip/diagram1.png" style="height:400px; width: auto" />
<div>
<small><a href="https://www.codeproject.com/Articles/8688/Extracting-files-from-a-remote-ZIP-archive">Image Credit</a></small>
</div>
</div><p>The local header metadata looks like this:</p>
<table>
<thead><tr>
<th>Field</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>local file header signature</td>
<td>4 bytes</td>
</tr>
<tr>
<td>version needed to extract</td>
<td>2 bytes</td>
</tr>
<tr>
<td>general purpose bit flag</td>
<td>2 bytes</td>
</tr>
<tr>
<td>compression method</td>
<td>2 bytes</td>
</tr>
<tr>
<td>last mod file time</td>
<td>2 bytes</td>
</tr>
<tr>
<td>last mod file date</td>
<td>2 bytes</td>
</tr>
<tr>
<td>crc-32</td>
<td>4 bytes</td>
</tr>
<tr>
<td>compressed size</td>
<td>4 bytes</td>
</tr>
<tr>
<td>uncompressed size</td>
<td>4 bytes</td>
</tr>
<tr>
<td>file name length</td>
<td>2 bytes</td>
</tr>
<tr>
<td>extra field length</td>
<td>2 bytes</td>
</tr>
<tr>
<td>file name</td>
<td>variable</td>
</tr>
<tr>
<td>extra field</td>
<td>variable</td>
</tr>
</tbody>
</table>
<p>The header signature is a single integer (<code>0x04034b50</code>) in
a valid zip file. We'll ignore version, the general purpose flag, and
the checksum. Compression is either <code>0</code> for no compression
or <code>8</code> for DEFLATE compression/decompression.</p>
<p>Last modified time and date is MSDOS-style date/time format which is
<a href="https://groups.google.com/g/comp.os.msdos.programmer/c/ffAVUFN2NbA">pretty
funky</a>.</p>
<p>Let's translate this roughly to Go with some high level flourishes.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="s">"compress/flate"</span>
<span class="w"> </span><span class="s">"io/ioutil"</span>
<span class="w"> </span><span class="s">"encoding/binary"</span>
<span class="w"> </span><span class="s">"time"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="kt">uint8</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">noCompression</span><span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">deflateCompression</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">localFileHeader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">signature</span><span class="w"> </span><span class="kt">uint32</span>
<span class="w"> </span><span class="nx">version</span><span class="w"> </span><span class="kt">uint16</span>
<span class="w"> </span><span class="nx">bitFlag</span><span class="w"> </span><span class="kt">uint16</span>
<span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="nx">compression</span>
<span class="w"> </span><span class="nx">lastModified</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span>
<span class="w"> </span><span class="nx">crc32</span><span class="w"> </span><span class="kt">uint32</span>
<span class="w"> </span><span class="nx">compressedSize</span><span class="w"> </span><span class="kt">uint32</span>
<span class="w"> </span><span class="nx">uncompressedSize</span><span class="w"> </span><span class="kt">uint32</span>
<span class="w"> </span><span class="nx">fileName</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">extraField</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">fileContents</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
</pre></div>
<h3 id="main">main</h3><p>Our entrypoint will read a zip file and keep walking through the file
until we stop being able to parse zip file entries.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">lfh</span><span class="w"> </span><span class="o">*</span><span class="nx">localFileHeader</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="kt">int</span>
<span class="w"> </span><span class="nx">lfh</span><span class="p">,</span><span class="w"> </span><span class="nx">next</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseLocalFileHeader</span><span class="p">(</span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">end</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">errNotZip</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">next</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">lfh</span><span class="p">.</span><span class="nx">lastModified</span><span class="p">,</span><span class="w"> </span><span class="nx">lfh</span><span class="p">.</span><span class="nx">fileName</span><span class="p">,</span><span class="w"> </span><span class="nx">lfh</span><span class="p">.</span><span class="nx">fileContents</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="files">Files</h3><p>For each file we'll fail early if the first four bytes are not the magic zip signature.</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">errNotZip</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Not a zip file"</span><span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">parseLocalFileHeader</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">localFileHeader</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">signature</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readUint32</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">signature</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x04034b50</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errNotZip</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>The basic pattern is that one of these read helpers will take an
offset and return a Go value and a new offset. The read helper will do
bounds checking. We'll define the read helpers further down.</p>
<p>Let's follow the same pattern to the end of the struct:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nv">version</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">bitFlag</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">compression</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nv">noCompression</span>
<span class="w"> </span><span class="nv">compressionRaw</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">compressionRaw</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">compression</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">deflateCompression</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">lmTime</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">lmDate</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">lastModified</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">msdosTimeToGoTime</span><span class="p">(</span><span class="nv">lmDate</span><span class="p">,</span><span class="w"> </span><span class="nv">lmTime</span><span class="p">)</span>
<span class="w"> </span><span class="nv">crc32</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint32</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">compressedSize</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint32</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">uncompressedSize</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint32</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">fileNameLength</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">extraFieldLength</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">fileName</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readString</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nf">int</span><span class="p">(</span><span class="nv">fileNameLength</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nv">extraField</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readBytes</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nf">int</span><span class="p">(</span><span class="nv">extraFieldLength</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Now if the file contents are uncompressed we can just copy bytes after
the file header. If the file contents are compressed though we'll use
Go's builtin DEFLATE support to decompress the bytes after the file
header.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">fileContents</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">noCompression</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fileContents</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">readString</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">uncompressedSize</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">compressedSize</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">flateReader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">flate</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">bs</span><span class="p">[</span><span class="nx">i</span><span class="p">:</span><span class="nx">end</span><span class="p">]))</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">flateReader</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadAll</span><span class="p">(</span><span class="nx">flateReader</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fileContents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">read</span><span class="p">)</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">end</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And return the filled out representation:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">localFileHeader</span><span class="p">{</span>
<span class="w"> </span><span class="nx">signature</span><span class="p">:</span><span class="w"> </span><span class="nx">signature</span><span class="p">,</span>
<span class="w"> </span><span class="nx">version</span><span class="p">:</span><span class="w"> </span><span class="nx">version</span><span class="p">,</span>
<span class="w"> </span><span class="nx">bitFlag</span><span class="p">:</span><span class="w"> </span><span class="nx">bitFlag</span><span class="p">,</span>
<span class="w"> </span><span class="nx">compression</span><span class="p">:</span><span class="w"> </span><span class="nx">compression</span><span class="p">,</span>
<span class="w"> </span><span class="nx">lastModified</span><span class="p">:</span><span class="w"> </span><span class="nx">lastModified</span><span class="p">,</span>
<span class="w"> </span><span class="nx">crc32</span><span class="p">:</span><span class="w"> </span><span class="nx">crc32</span><span class="p">,</span>
<span class="w"> </span><span class="nx">compressedSize</span><span class="p">:</span><span class="w"> </span><span class="nx">compressedSize</span><span class="p">,</span>
<span class="w"> </span><span class="nx">uncompressedSize</span><span class="p">:</span><span class="w"> </span><span class="nx">uncompressedSize</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fileName</span><span class="p">:</span><span class="w"> </span><span class="nx">fileName</span><span class="p">,</span>
<span class="w"> </span><span class="nx">extraField</span><span class="p">:</span><span class="w"> </span><span class="nx">extraField</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fileContents</span><span class="p">:</span><span class="w"> </span><span class="nx">fileContents</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h3 id="read-helpers">Read helpers</h3><p>Now we just define those read helpers with bounds checking, using Go's
builtin libraries for dealing with binary encodings.</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">errOverranBuffer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Overran buffer"</span><span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">readUint32</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">uint32</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint32</span><span class="p">(</span><span class="nx">bs</span><span class="p">[</span><span class="nx">offset</span><span class="p">:</span><span class="nx">end</span><span class="p">]),</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">readUint16</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">uint16</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">offset</span><span class="o">+</span><span class="mi">2</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint16</span><span class="p">(</span><span class="nx">bs</span><span class="p">[</span><span class="nx">offset</span><span class="p">:</span><span class="nx">end</span><span class="p">]),</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And basically only bounds checking for grabbing bytes and strings.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">n</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="nx">offset</span><span class="p">:</span><span class="nx">offset</span><span class="o">+</span><span class="nx">n</span><span class="p">],</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">readString</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">read</span><span class="p">),</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="p">}</span>
</pre></div>
<h3 id="msdos-time">MSDOS time</h3><p>At the time zip was created, MSDOS time format was popular, I
guess. But it's not popular today so it took a bit of work to finally
find <a href="https://groups.google.com/g/comp.os.msdos.programmer/c/ffAVUFN2NbA">an explanation of the
format</a>
with some code (in C).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">msdosTimeToGoTime</span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="kt">uint16</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="kt">uint16</span><span class="p">)</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">seconds</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">((</span><span class="nx">t</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x1F</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nx">minutes</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">((</span><span class="nx">t</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x3F</span><span class="p">)</span>
<span class="w"> </span><span class="nx">hours</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">11</span><span class="p">)</span>
<span class="w"> </span><span class="nx">day</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x1F</span><span class="p">)</span>
<span class="w"> </span><span class="nx">month</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Month</span><span class="p">((</span><span class="nx">d</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x0F</span><span class="p">)</span>
<span class="w"> </span><span class="nx">year</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">((</span><span class="nx">d</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">9</span><span class="p">)</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0x7F</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1980</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Date</span><span class="p">(</span><span class="nx">year</span><span class="p">,</span><span class="w"> </span><span class="nx">month</span><span class="p">,</span><span class="w"> </span><span class="nx">day</span><span class="p">,</span><span class="w"> </span><span class="nx">hours</span><span class="p">,</span><span class="w"> </span><span class="nx">minutes</span><span class="p">,</span><span class="w"> </span><span class="nx">seconds</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Local</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h3 id="tout-ensemble">Tout ensemble</h3><p>Running it we get:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build
$<span class="w"> </span>./gozip<span class="w"> </span>test.zip
<span class="m">2021</span>-11-23<span class="w"> </span><span class="m">23</span>:04:20<span class="w"> </span>+0000<span class="w"> </span>UTC<span class="w"> </span>hello.text<span class="w"> </span>Hello!
</pre></div>
<p>That looks good! Now let's try zipping more than one file.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>bye.text
Au<span class="w"> </span>revoir!
$<span class="w"> </span>rm<span class="w"> </span>test.zip
$<span class="w"> </span>zip<span class="w"> </span>test.zip<span class="w"> </span>*.text
<span class="w"> </span>adding:<span class="w"> </span>bye.text<span class="w"> </span><span class="o">(</span>stored<span class="w"> </span><span class="m">0</span>%<span class="o">)</span>
<span class="w"> </span>adding:<span class="w"> </span>hello.text<span class="w"> </span><span class="o">(</span>stored<span class="w"> </span><span class="m">0</span>%<span class="o">)</span>
$<span class="w"> </span>./gozip<span class="w"> </span>test.zip
<span class="m">2021</span>-11-24<span class="w"> </span><span class="m">03</span>:40:00<span class="w"> </span>+0000<span class="w"> </span>UTC<span class="w"> </span>bye.text<span class="w"> </span>Au<span class="w"> </span>revoir!
<span class="m">2021</span>-11-23<span class="w"> </span><span class="m">23</span>:04:20<span class="w"> </span>+0000<span class="w"> </span>UTC<span class="w"> </span>hello.text<span class="w"> </span>Hello!
</pre></div>
<p>Fab.</p>
<h3 id="notes">Notes</h3><p>There are many parts of the standard to deal with (e.g. directories)
and many common extensions. I'm ignoring them.</p>
<p>There's some space left at the end of the file which is probably the
"central directory" metadata but I haven't dug into
that. Understanding those last remaining bits are probably necessary
if I want to be able to <em>create</em> zip archives.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post on building a zip archive reader in Go!<a href="https://t.co/U0Yg2powlP">https://t.co/U0Yg2powlP</a> <a href="https://t.co/ns5dF3mjIx">pic.twitter.com/ns5dF3mjIx</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1463354752675323904?ref_src=twsrc%5Etfw">November 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/implementing-zip-in-go-unzipping.htmlTue, 23 Nov 2021 00:00:00 +0000
- Benchmarking esbuild, swc, tsc, and babel for React/JSX projectshttp://notes.eatonphil.com/benchmarking-esbuild-swc-typescript-babel.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-11-13-benchmarking-esbuild-swc-typescript-babel.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-11-13-benchmarking-esbuild-swc-typescript-babel.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/benchmarking-esbuild-swc-typescript-babel.htmlSat, 13 Nov 2021 00:00:00 +0000
- Building a fast SCSS-like rule expander for CSS using fuzzy parsinghttp://notes.eatonphil.com/building-a-nested-css-rule-expander.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-10-31-building-a-nested-css-rule-expander.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-10-31-building-a-nested-css-rule-expander.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/building-a-nested-css-rule-expander.htmlSun, 31 Oct 2021 00:00:00 +0000
- Exploring PL/pgSQL part two: implementing a Forth-like interpreterhttp://notes.eatonphil.com/exploring-plpgsql-forth-like.html<p class="note">
Previously in exploring PL/pgSQL:
<br />
<a href="exploring-plpgsql.html">Strings, arrays, recursion and parsing JSON</a>
</p><p>In my <a href="https://notes.eatonphil.com/exploring-plpgsql.html">last post</a>
I walked through the basics of PL/pgSQL, the embedded procedural
language inside of PostgreSQL. It covered simple functions, recursions
and parsing. But there was something very obviously missing from that
post: a working interpreter.</p>
<p>So in this post we'll walk through building a Forth-like language from
scratch in PL/pgSQL. We'll be able to write a fibonacci function in
this Forth-like language and have it be evaluated correctly like so:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">"SELECT sm_run('</span>
<span class="s2">DEF fib</span>
<span class="s2"> DUP 1 > IF</span>
<span class="s2"> 1- DUP 1- fib CALL SWAP fib CALL + THEN</span>
<span class="s2"> RET</span>
<span class="s2">20 fib CALL</span>
<span class="s2">EXIT')"</span>
...
<span class="w"> </span>sm_run
--------
<span class="w"> </span><span class="m">6765</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
</pre></div>
<p>All code is available on <a href="https://github.com/eatonphil/exploring-plpgsql/blob/main/sm.sql">Github</a>.</p>
<h3 id="forth">Forth</h3><p><a href="https://www.forth.com/resources/forth-programming-language/">Forth</a>
is a stack-oriented language. Literals are pushed onto the stack.
Functions and builtins operate on the stack.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">"SELECT sm_run('3 2 + EXIT')"</span>
</pre></div>
<p>Will produce <code>5</code>. And:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">"SELECT sm_run('3 2 + 1 - EXIT')"</span>
</pre></div>
<p>Will produce <code>4</code>.</p>
<p>Our code will notably not be a real Forth, since there are many
special features of a real Forth. But it will look like one to a
novice Forth programmer like myself.</p>
<p>You can read more about Forth basics
<a href="https://skilldrick.github.io/easyforth/">here</a>. And you can read a
truly stunning, real Forth implementation in
<a href="https://github.com/nornagon/jonesforth/blob/master/jonesforth.S">jonesforth.S</a>. Or
you can pick up <a href="https://letoverlambda.com/">Let Over Lambda</a> for a
fantastic book on Common Lisp that culminates in a Forth interpreter.</p>
<h3 id="implementation">Implementation</h3><p>Since the builtin <code>array_length($arr, $dim)</code> returns <code>NULL</code> if the
array is <code>NULL</code> and our dimension is always 1, we'll write a helper.</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">sm_alength</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">sm_alength</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">text</span><span class="p">[])</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>We'll also need to bring in the <code>hstore</code> extension so we can map
function names to their positions. (We could use an association list
but those are less programmer-friendly.)</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">hstore</span><span class="p">;</span>
</pre></div>
<p>Our interpreter function will take a string to evaluate, splitting the
string on whitespace into tokens.</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">sm_run</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">sm_run</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">tokens</span><span class="w"> </span><span class="nb">text</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">regexp_split_to_array</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="s1">'\s+'</span><span class="p">);</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="nb">text</span><span class="p">[];</span><span class="w"> </span><span class="c1">-- Data stack</span>
<span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="n">hstore</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Map of functions to location</span>
<span class="w"> </span><span class="n">tmps</span><span class="w"> </span><span class="nb">text</span><span class="p">[];</span><span class="w"> </span><span class="c1">-- Array we can use for temporary variables</span>
<span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current token</span>
<span class="w"> </span><span class="n">rps</span><span class="w"> </span><span class="nb">text</span><span class="p">[];</span><span class="w"> </span><span class="c1">-- Return pointer stack, always ints but easier to store as text</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Program counter</span>
<span class="k">BEGIN</span>
</pre></div>
<p>We set up a <code>tmps</code> array because each builtin may need differing
number of temporary variables and PL/pgSQL makes ad-hoc variables
cumbersome (or at least an easier way exists outside my knowledge).</p>
<p>And we store the return pointer stack as a text array so that we can
use <code>sm_alength</code> on it even though values in this array will always be
integers.</p>
<p>Next we'll start an infinite loop to evaluate the program. The only thing
that will stop the input is the <code>EXIT</code> builtin that will return from
this function with the top of the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">true</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="p">];</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="n">NOTICE</span><span class="w"> </span><span class="s1">'[Debug] Current token: %. Current stack: %.'</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">stack</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'PC out of bounds.'</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'EXIT'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="n">TODO</span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>If no other condition is met (the token is not a builtin), we push it
onto the data stack and increment the program counter.</p>
<h3 id="conditionals">Conditionals</h3><p>The <code>IF</code> builtin pops the top of the stack. If it is true evaluation
continues. If it is false evaluation skips ahead until after a <code>THEN</code>
builtin.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">"SELECT sm_run('1 1 1 = IF 2 THEN EXIT')"</span>
</pre></div>
<p>Produces <code>2</code>. But</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">"SELECT sm_run('1 1 0 = IF 2 THEN EXIT')"</span>
</pre></div>
<p>Produces <code>1</code>.</p>
<h3 id="implementation">Implementation</h3><p>Joining the <code>EXIT</code> condition in the interpeter loop we get:</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">true</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'IF'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab last item from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">boolean</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="p">]</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'THEN'</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Skip past THEN</span>
<span class="w"> </span><span class="k">ELSE</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'THEN'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Just skip past it</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'EXIT'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<h3 id="other-builtins">Other builtins</h3><p>The <code>DUP</code> builtin makes a copy of the top of the stack. The <code>SWAP</code>
builtin swaps the order of the top two items on the stack. And the
<code>1-</code> builtin subtracts 1 from the top of the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'DUP'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab item</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Add it to the stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'1-'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab item</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Rewrite top of stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'SWAP'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Swap the two</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<p>It's important that every builtin handle incrementing the program
counter and skipping to the beginning of the loop. Because some
builtins increment the program counter under different conditions
(like <code>IF</code> above).</p>
<p>The last few builtins are the simplest: arithmetic operations that
produce integers or booleans.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'='</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'>'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'+'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'-'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'*'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'/'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab two items from stack</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Replace last item on stack</span>
<span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<h3 id="function-definitions">Function definitions</h3><p>Functions here will differ from Forth, borrowing elements of machine
code. Return pointers will be stored in a dedicated return pointer
stack. We could store it on the data stack but that would require more
effort on the part of the programmer to restore the stack. Calling
<code>RET</code> inside a function pops a return pointer off the return pointer
stack.</p>
<p>Here's a simple function definition: <code>DEF plus + RET</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'DEF'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="o">+</span><span class="mi">1</span><span class="p">];</span><span class="w"> </span><span class="c1">-- function name</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span><span class="w"> </span><span class="c1">-- starting pc</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="p">]</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'RET'</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="c1">-- RAISE NOTICE '[Debug] skipping past: %.', tokens[pc];</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hstore</span><span class="p">(</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="w"> </span><span class="k">ELSE</span>
<span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">hstore</span><span class="p">(</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- continue past 'RET'</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<p>There doesn't seem to be a way to combine a NULL hstore value and a
non-NULL hstore value. So that's why we need that special case.</p>
<h3 id="return">Return</h3><p>The <code>RET</code> builtin pops a value off the return pointer stack and jumps
to it.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'RET'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab last return pointer</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rps</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">rps</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Drop last return pointer from stack</span>
<span class="w"> </span><span class="n">rps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rps</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">rps</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Jump to last return pointer</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<h3 id="function-calls">Function calls</h3><p>Forming the other half of function calls is the <code>CALL</code> builtin. This
places the program counter (plus one, past the <code>CALL</code> token) onto the
return pointer stack and jumps to the position of the function if it
exists.</p>
<p>A simple function call for the above <code>plus</code> function might be: <code>2 3
plus CALL</code> and would produce <code>5</code> on the top of the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'CALL'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="c1">-- Grab item</span>
<span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span>
<span class="w"> </span><span class="c1">-- Remove one item from stack</span>
<span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="c1">-- Store return pointer</span>
<span class="w"> </span><span class="n">rps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">rps</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">)::</span><span class="nb">text</span><span class="p">);</span>
<span class="w"> </span><span class="c1">-- Fail if function not defined</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">defs</span><span class="o">?</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'No such function, %.'</span><span class="p">,</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="c1">-- Otherwise jump to function</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="n">NOTICE</span><span class="w"> </span><span class="s1">'[Debug] Jumping to: %:%.'</span><span class="p">,</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">defs</span><span class="o">-></span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">defs</span><span class="o">-></span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<p>And that's it! All done the basic instructions needed. Store all that code in <code>sm.sql</code> and grab the <code>test.sh</code> code from the previous post:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>./test.sh
sudo<span class="w"> </span>-u<span class="w"> </span>postgres<span class="w"> </span>psql<span class="w"> </span>-c<span class="w"> </span><span class="s2">"</span><span class="k">$(</span><span class="nb">printf</span><span class="w"> </span><span class="s2">"%s;\n%s"</span><span class="w"> </span><span class="s2">"</span><span class="k">$(</span>cat<span class="w"> </span><span class="nv">$1</span><span class="k">)</span><span class="s2">"</span><span class="w"> </span><span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span><span class="k">)</span><span class="s2">"</span>
</pre></div>
<p>And try out our port of recursive fibonacci:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">"SELECT sm_run('</span>
<span class="s2">DEF fib</span>
<span class="s2"> DUP 1 > IF</span>
<span class="s2"> 1- DUP 1- fib CALL SWAP fib CALL + THEN</span>
<span class="s2"> RET</span>
<span class="s2">20 fib CALL</span>
<span class="s2">EXIT')"</span>
...
<span class="w"> </span>sm_run
--------
<span class="w"> </span><span class="m">6765</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
</pre></div>
<p>Happy PL/pgSQL- and Forth-ish-ing!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post is up! Writing a Forth(-inspired language) implementation from scratch in PL/pgSQL. Because who doesn't want to be able to run stack machine code from SELECT statements in PostgreSQL?<a href="https://t.co/sbxhuDp1J9">https://t.co/sbxhuDp1J9</a> <a href="https://t.co/9nrHEIhRPa">pic.twitter.com/9nrHEIhRPa</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1453958284109500417?ref_src=twsrc%5Etfw">October 29, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/exploring-plpgsql-forth-like.htmlFri, 29 Oct 2021 00:00:00 +0000
- Exploring PL/pgSQL: Strings, arrays, recursion, and parsing JSONhttp://notes.eatonphil.com/exploring-plpgsql.html<p class="note">
Next in exploring PL/pgSQL:
<br />
<a href="exploring-plpgsql-forth-like.html">Implementing a Forth-like interpreter</a>
</p><p>PostgreSQL comes with a builtin imperative programming language called
PL/pgSQL. I used to think this language was scary because it has a bit
more adornment than your usual language does. But looking deeper, it's
actually reasonably pleasant to program in.</p>
<p>In this post we'll get familiar with it by working with strings,
arrays and recursive functions. We'll top it all off by building a
parser for a subset of JSON (no nested objects, no arrays, no unicode,
no decimals).</p>
<p>The goal here is not production-quality code (an amazing JSON library
is already built into PostgreSQL) but simply to get more familiar with
the PL/pgSQL language.</p>
<p>All code for this post is available on <a href="https://github.com/eatonphil/exploring-plpgsql">Github</a>.</p>
<h3 id="creating-functions">Creating functions</h3><p>Functions are declared like tables. Here's a very simple one that
returns the length of a string:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>It's not a very useful function because <code>length</code> already exists but
the point is to see a basic custom function.</p>
<p>All statements in PL/pgSQL must end in a semicolon. Arguments do not
have to be named. If they are not named they get default names of <code>$1</code>
to <code>$N</code>.</p>
<h4 id="named/unnamed-arguments">Named/unnamed arguments</h4><p>Here's how the function could be written without named arguments:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">);</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<h4 id="out-declarations">Out declarations</h4><p>PL/pgSQL also allows you to declare which variables will be returned
in the function argument list. They call it OUT parameters but as far
as I can tell it is not like OUT parameters in C# where you are
modifying the value of a variable in an external scope.</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>This is still equivalent to the first function and is basically a shortcut for:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">);</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">i</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>Whether you declare OUT or not you still must include <code>RETURNS <type></code>
in the function signature otherwise even if you call <code>RETURN</code> in the
body, the result will just be ignored.</p>
<p>Don't worry about case sensitivity too much. It's really only
important, as in typical SQL, for mixed-case table and column
names. But we won't be dealing with that situation in this article
focused on programming PL/pgSQL.</p>
<h4 id="testing-it-out">Testing it out</h4><p>Once the function is created, you can call it like <code>SELECT
slength('foo');</code>. So here's a helper script to load a SQL file and run
a command:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>./test.sh
sudo<span class="w"> </span>-u<span class="w"> </span>postgres<span class="w"> </span>psql<span class="w"> </span>-c<span class="w"> </span><span class="s2">"</span><span class="k">$(</span><span class="nb">printf</span><span class="w"> </span><span class="s2">"%s;\n%s"</span><span class="w"> </span><span class="s2">"</span><span class="k">$(</span>cat<span class="w"> </span><span class="nv">$1</span><span class="k">)</span><span class="s2">"</span><span class="w"> </span><span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span><span class="k">)</span><span class="s2">"</span>
$<span class="w"> </span>chmod<span class="w"> </span>+x<span class="w"> </span>./test.sh
</pre></div>
<p>After storing the above <code>slength</code> code in <code>slength.sql</code> we can run a test:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">test</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">slength</span><span class="p">.</span><span class="k">sql</span><span class="w"> </span><span class="ss">"SELECT slength('foo')"</span>
<span class="w"> </span><span class="n">slength</span>
<span class="c1">---------</span>
<span class="w"> </span><span class="mi">3</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span>
</pre></div>
<p>Easy!</p>
<h3 id="numbers-and-recursion">Numbers and recursion</h3><p>Ok now that we've got the basics of function definition down and a way
to test the code, let's write a fibonacci program.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>./fib.sql
CREATE<span class="w"> </span>OR<span class="w"> </span>REPLACE<span class="w"> </span>FUNCTION<span class="w"> </span>fib<span class="o">(</span>i<span class="w"> </span>int<span class="o">)</span><span class="w"> </span>RETURNS<span class="w"> </span>int<span class="w"> </span>AS<span class="w"> </span><span class="nv">$$</span>
BEGIN
<span class="w"> </span>IF<span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>OR<span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span>THEN
<span class="w"> </span>RETURN<span class="w"> </span>i<span class="p">;</span>
<span class="w"> </span>END<span class="w"> </span>IF<span class="p">;</span>
<span class="w"> </span>RETURN<span class="w"> </span>fib<span class="o">(</span>i<span class="w"> </span>-<span class="w"> </span><span class="m">1</span><span class="o">)</span><span class="w"> </span>+<span class="w"> </span>fib<span class="o">(</span>i<span class="w"> </span>-<span class="w"> </span><span class="m">2</span><span class="o">)</span><span class="p">;</span>
END<span class="p">;</span>
<span class="nv">$$</span><span class="w"> </span>LANGUAGE<span class="w"> </span>plpgsql<span class="p">;</span>
</pre></div>
<p>Everything in the if test is normal SQL WHERE clause syntax. This
makes it very easy for folks familiar with SQL to pick up conditionals
in PL/pgSQL.</p>
<p>And there's no special syntax to allow function recursion. Nice!</p>
<p>Run and test this function:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>./fib.sql<span class="w"> </span><span class="s2">"SELECT fib(10)"</span>
<span class="w"> </span>fib
-----
<span class="w"> </span><span class="m">55</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
</pre></div>
<p>Getting the hang of it?</p>
<h3 id="strings-and-arrays">Strings and arrays</h3><p>You may have noticed that <code>length</code> used in <code>slength</code> is a builtin
PostgreSQL function for dealing with strings. All builtin functions in
PostgreSQL can be used in PL/pgSQL.</p>
<p>In order to get familiar with using arrays in PL/pgSQL let's write a
<code>string_to_array</code> function.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">string_to_array</span><span class="p">.</span><span class="k">sql</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">string_to_array</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">char</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="nb">char</span><span class="p">[];</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">a</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>This is one way to do it by modify array values directly by index. We
need to coalesce because calling <code>array_length</code> on an empty array
returns <code>NULL</code>.</p>
<p>Another way to do this is by calling the builtin function <code>array_append</code>.</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">string_to_array</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">char</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="nb">char</span><span class="p">[];</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)::</span><span class="nb">char</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">a</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>We can test and run both:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>./string_to_array.sql<span class="w"> </span><span class="s2">"SELECT string_to_array('foo')"</span>
<span class="w"> </span>string_to_array
-----------------
<span class="w"> </span><span class="o">{</span>f,o,o<span class="o">}</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
$<span class="w"> </span>./test.sh<span class="w"> </span>./string_to_array2.sql<span class="w"> </span><span class="s2">"SELECT string_to_array('foo')"</span>
<span class="w"> </span>string_to_array
-----------------
<span class="w"> </span><span class="o">{</span>f,o,o<span class="o">}</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
</pre></div>
<p>Of course the builtin alternative might be <code>SELECT
regexp_split_to_array('foo')</code> but we need the practice.</p>
<h3 id="custom-compound-types">Custom compound types</h3><p>If we're going to lex and parse JSON, we're going to want to return an
array of tokens from the lexer. A token will need to contain the type
(e.g. number, string, syntax) and the string value of the token
(e.g. <code>1</code>, <code>{</code>, <code>my great key</code>).</p>
<p>PostgreSQL allows us to create compound types that we can then use as
the base of an array:</p>
<div class="highlight"><pre><span></span><span class="nv">DROP</span><span class="w"> </span><span class="nv">TYPE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="nv">EXISTS</span><span class="w"> </span><span class="nv">json_token</span><span class="w"> </span><span class="nv">CASCADE</span><span class="c1">;</span>
<span class="nv">CREATE</span><span class="w"> </span><span class="nv">TYPE</span><span class="w"> </span><span class="nv">json_token</span><span class="w"> </span><span class="nv">AS</span><span class="w"> </span><span class="ss">(</span>
<span class="w"> </span><span class="nv">kind</span><span class="w"> </span><span class="nv">text</span>,
<span class="w"> </span><span class="nv">value</span><span class="w"> </span><span class="nv">text</span>
<span class="ss">)</span><span class="c1">;</span>
</pre></div>
<p>We need to add <code>CASCADE</code> here because functions will have this type in
their signature and it otherwise makes PostgreSQL unhappy to delete
the type used in a function before deleting the function.</p>
<p>We can create literals of this type like <code>SELECT ('number',
'12')::json_token)</code>.</p>
<p>Now we're ready to build out the lexer.</p>
<h3 id="lexing">Lexing</h3><p>The lexers job is to clump together groups of characters into tokens.</p>
<p>I'm going to describe this function in literate code.</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_lex</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="n">json_token</span><span class="p">[])</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">json_token</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
</pre></div>
<p>This function takes a string in and returns an array of json tokens.</p>
<div class="highlight"><pre><span></span><span class="k">DECLARE</span><span class="w"> </span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Index in loop</span>
<span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current character in loop</span>
<span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current accumulated characters</span>
</pre></div>
<p>We need to declare all variables up front.</p>
<div class="highlight"><pre><span></span><span class="k">BEGIN</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">j</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
</pre></div>
<p>The main loop just looks at all characters.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle syntax characters</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'{'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'}'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">','</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">':'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s1">'syntax'</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">)::</span><span class="n">json_token</span><span class="p">);</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
</pre></div>
<p>First we look if the character is a syntax character. If it is we
append it to the array of tokens, increment the index, and go back to
the start of the main loop.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle whitespace</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">regexp_replace</span><span class="p">(</span><span class="k">c</span><span class="p">,</span><span class="w"> </span><span class="s1">'^\s+'</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
</pre></div>
<p>Then we check for whitespace characters. If replacing all whitespace
characters returns an empty string then we know it's whitespace. We
could also have done something like <code>IF c = ' ' OR c = '\n'
... THEN</code> instead.</p>
<p>Same as before though if we find whitespace characters we move on
(don't accumulate them) and restart the main loop.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle strings</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'"'</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s1">'string'</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">)::</span><span class="n">json_token</span><span class="p">);</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
</pre></div>
<p>Next we loop through any strings we find and accumulate them as tokens
before restarting the main loop.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle numbers</span>
<span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s1">'^[0-9]+$'</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s1">'number'</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">)::</span><span class="n">json_token</span><span class="p">);</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
</pre></div>
<p>Then we look for integers.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'Unknown character: %, at index: %; already found: %.'</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">ts</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>Lastly if none of those lexing handlers match, we give up! Then the
loop is done and the function is too.</p>
<p>There's no <code>RETURN</code> statement because we already declared an <code>OUT</code>
variable.</p>
<p>If we test and run this now:</p>
<div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">"SELECT json_lex('{\"flubberty\": 12, \"nice\": \"a\"}')"</span>
<span class="w"> </span>json_lex
----------------------------------------------------------------------------------------------------------------------------------------
<span class="w"> </span><span class="o">{</span><span class="s2">"(syntax,{)"</span>,<span class="s2">"(string,flubberty)"</span>,<span class="s2">"(syntax,:)"</span>,<span class="s2">"(number,12)"</span>,<span class="s2">"(syntax,\",\")"</span>,<span class="s2">"(string,nice)"</span>,<span class="s2">"(syntax,:)"</span>,<span class="s2">"(string,a)"</span>,<span class="s2">"(syntax,})"</span><span class="o">}</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
</pre></div>
<p>It's messy but it worked! Now on to parsing.</p>
<h3 id="parsing">Parsing</h3><p>Our parser will only accept JSON objects. JSON objects will be defined
as an array of key-value pairs. Custom types make this nice again.</p>
<div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TYPE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">json_key_value</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TYPE</span><span class="w"> </span><span class="n">json_key_value</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span>
<span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="nb">text</span>
<span class="p">);</span>
</pre></div>
<p>One thing PostgreSQL does not make nice is sum types or parametric
types. But even if the value here is stored as text it can be easily
cast to a number by the user. And again, we're not going to support
nested objects/arrays. But using <code>hstore</code> for key-values might be the
better alternative if we wanted to build a real JSON parser.</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_parse</span><span class="p">(</span><span class="n">ts</span><span class="w"> </span><span class="n">json_token</span><span class="p">[],</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="n">json_token</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current token in tokens loop</span>
<span class="w"> </span><span class="n">kvs</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[];</span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'syntax'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'{'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'Invalid JSON, must be an object, got: %.'</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
</pre></div>
<p>First up in the parser is variable declarations and validating that
this list of tokens represents a JSON object.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'syntax'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'}'</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">array_length</span><span class="p">(</span><span class="n">kvs</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'syntax'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">','</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'JSON key-value pair must be followed by a comma or closing brace, got: %.'</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
</pre></div>
<p>Then we loop to find each key-value pair. If one has already been
found, we need to find a comma before the next pair.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'string'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'JSON object must start with string key, got: %.'</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">'syntax'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o"><></span><span class="w"> </span><span class="s1">':'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'JSON object must start with string key followed by colon, got: %.'</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'number'</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'string'</span><span class="w"> </span><span class="k">THEN</span>
<span class="w"> </span><span class="n">kvs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">kvs</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)::</span><span class="n">json_key_value</span><span class="p">);</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'Invalid key-value pair syntax, got: %.'</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">kvs</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>Then we just look for the key, colon, value syntax and fail if we
don't see it. And that's it! Very simple when not dealing with arrays
and nested objects.</p>
<h3 id="helpers">Helpers</h3><p>Lastly it would just be nice to have a single function that calls lex and parse:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_from_string</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">json_parse</span><span class="p">(</span><span class="n">json_lex</span><span class="p">(</span><span class="n">s</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>And another function to look up a value in a parsed object by key:</p>
<div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_get</span><span class="p">(</span><span class="n">kvs</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[],</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span>
<span class="k">DECLARE</span>
<span class="w"> </span><span class="n">kv</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">;</span>
<span class="k">BEGIN</span>
<span class="w"> </span><span class="n">FOREACH</span><span class="w"> </span><span class="n">kv</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="nb">ARRAY</span><span class="w"> </span><span class="n">kvs</span><span class="w"> </span><span class="n">LOOP</span>
<span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">kv</span><span class="p">.</span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="p">(</span><span class="n">kv</span><span class="p">.</span><span class="n">v</span><span class="p">::</span><span class="n">json_token</span><span class="p">).</span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span>
<span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span>
<span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">'Key not found.'</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span>
</pre></div>
<p>And we're all set!</p>
<h3 id="testing">Testing</h3><p>Let's try some bad syntax (missing a comma between pairs):</p>
<div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">"SELECT json_get(json_from_string('{\"flubberty\": 12 \"nice\": \"a\"}'), 'ipo')"</span>
ERROR:<span class="w"> </span>JSON<span class="w"> </span>key-value<span class="w"> </span>pair<span class="w"> </span>must<span class="w"> </span>be<span class="w"> </span>followed<span class="w"> </span>by<span class="w"> </span>a<span class="w"> </span>comma<span class="w"> </span>or<span class="w"> </span>closing<span class="w"> </span>brace,<span class="w"> </span>got:<span class="w"> </span><span class="o">(</span>string,nice<span class="o">)</span>.
CONTEXT:<span class="w"> </span>PL/pgSQL<span class="w"> </span><span class="k">function</span><span class="w"> </span>json_parse<span class="o">(</span>json_token<span class="o">[]</span>,integer<span class="o">)</span><span class="w"> </span>line<span class="w"> </span><span class="m">18</span><span class="w"> </span>at<span class="w"> </span>RAISE
PL/pgSQL<span class="w"> </span><span class="k">function</span><span class="w"> </span>json_from_string<span class="o">(</span>text<span class="o">)</span><span class="w"> </span>line<span class="w"> </span><span class="m">3</span><span class="w"> </span>at<span class="w"> </span>RETURN
</pre></div>
<p>Sweet, it fails correctly.</p>
<p>Now correct syntax but missing key:</p>
<div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">"SELECT json_get(json_from_string('{\"flubberty\": 12, \"nice\": \"a\"}'), 'ipo')"</span>
ERROR:<span class="w"> </span>Key<span class="w"> </span>not<span class="w"> </span>found.
CONTEXT:<span class="w"> </span>PL/pgSQL<span class="w"> </span><span class="k">function</span><span class="w"> </span>json_get<span class="o">(</span>json_key_value<span class="o">[]</span>,text<span class="o">)</span><span class="w"> </span>line<span class="w"> </span><span class="m">9</span><span class="w"> </span>at<span class="w"> </span>RAISE
</pre></div>
<p>And finally, correct syntax and existing key:</p>
<div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">"SELECT json_get(json_from_string('{\"flubberty\": 12, \"nice\": \"a\"}'), 'flubberty')"</span>
<span class="w"> </span>json_get
----------
<span class="w"> </span><span class="m">12</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span>
</pre></div>
<p>Huzzah! Now hopefully PL/pgSQL is a little less scary to you, whether
or not you decide to use it.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">For everyone dying to write imperative code in PostgreSQL, I wrote a post about PL/pgSQL 👽 It starts with implementing simple string and array functions, to recursive Fibonacci, to a small JSON parsing library. A nice little language with a great stdlib!<a href="https://t.co/m4Tff99N6R">https://t.co/m4Tff99N6R</a> <a href="https://t.co/2ZMJn2foNa">pic.twitter.com/2ZMJn2foNa</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1452339113131139072?ref_src=twsrc%5Etfw">October 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/exploring-plpgsql.htmlSun, 24 Oct 2021 00:00:00 +0000
- Experimenting with column- and row-oriented datastructureshttp://notes.eatonphil.com/experimenting-with-column-and-row-oriented-datastructures.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-10-18-experimenting-with-column-and-row-oriented-datastructures.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-10-18-experimenting-with-column-and-row-oriented-datastructures.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/experimenting-with-column-and-row-oriented-datastructures.htmlMon, 18 Oct 2021 00:00:00 +0000
- Notes on running Electronhttp://notes.eatonphil.com/notes-on-running-electron.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-10-13-notes-on-running-electron.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-10-13-notes-on-running-electron.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/notes-on-running-electron.htmlWed, 13 Oct 2021 00:00:00 +0000
- Enumerating and analyzing 40+ non-V8 JavaScript implementationshttp://notes.eatonphil.com/javascript-implementations.html<p>V8 is, I'm sure, the most used implementation of JavaScript
today. Used in Chrome, (and by extension) Microsoft Edge, Node.js,
etc. Safari's JavaScriptCore and Firefox's SpiderMonkey are also
contenders for extremely mainstream implementations.</p>
<p>But what else is out there? What if I want to embed JavaScript in a C
program, or a Go program, or a Rust program, or a Java program(, and
so on)? Or what if I want to run JavaScript on a microcontroller? Or
use it as a base for language research? It turns out there are many
high-quality implementations out there.</p>
<p>This post describes a number of them and their implementation
choices. I'm not going to cover V8, JavaScriptCore, or SpiderMonkey
because they are massive and hide multiple various interpreters and
compilers inside. Plus, you already know about them.</p>
<p class="note">
I'm going to miss some implementations and get some details
wrong. Please <a href="https://twitter.com/phil_eaton">Tweet</a> or
<a href="mailto:[email protected]">email</a> me with your corrections! I'd be
particularly interested to hear about pure-research; and commercial,
closed-source implementations of JavaScript.
</p><h3 id="corporate-backed">Corporate-backed</h3><p>These are implementations that would make sense to look into for your
own commercial, production applications.</p>
<h4 id="on-the-jvm">On the JVM</h4><ul>
<li><a href="https://github.com/oracle/graaljs">Oracle's GraalJS</a>: compiles JavaScript to JVM bytecode or GraalVM<ul>
<li>Support: Full compatibility with latest ECMAScript specification</li>
<li>Implementation language: Java</li>
<li>Runtime: <a href="https://www.graalvm.org/">GraalVM</a> or <a href="https://www.graalvm.org/reference-manual/js/RunOnJDK/">stock JDK</a></li>
<li>Parser: <a href="https://github.com/oracle/graaljs/blob/master/graal-js/src/com.oracle.js.parser/src/com/oracle/js/parser/Parser.java">Hand-written</a></li>
<li>First release: <a href="https://github.com/oracle/graaljs/releases/tag/vm-19.0.0">2019?</a></li>
<li>Notes: Replaced Nashorn as the default JavaScript implementation in JDK.</li>
</ul>
</li>
<li><a href="https://github.com/mozilla/rhino">Mozilla's Rhino</a>: interprets and compiles JavaScript to JVM bytecode<ul>
<li>Support: ES6</li>
<li>Implementation language: Java</li>
<li>Runtime: Both <a href="https://github.com/mozilla/rhino/blob/master/src/org/mozilla/javascript/Interpreter.java">interpreted through custom bytecode VM</a> and interpreted <a href="https://github.com/mozilla/rhino/blob/master/src/org/mozilla/javascript/optimizer/Codegen.java">after compiling to JVM bytecode</a> as an optimization</li>
<li>Parser: <a href="https://github.com/mozilla/rhino/blob/master/src/org/mozilla/javascript/Parser.java">Hand-written</a></li>
<li>First release: <a href="http://udn.realityripple.com/docs/Mozilla/Projects/Rhino/History">1998?</a></li>
<li>Notes: Replaced by Nashorn as the default JavaScript engine on the JVM, but remains actively developed.</li>
</ul>
</li>
<li><a href="https://github.com/openjdk/nashorn">Oracle's Nashorn</a>: compiles JavaScript to JVM bytecode<ul>
<li>Support: ES5</li>
<li>Implementation language: Java</li>
<li>Runtime: compiles to <a href="https://github.com/openjdk/nashorn/tree/main/src/org.openjdk.nashorn/share/classes/org/openjdk/nashorn/internal/codegen">JVM bytecode</a></li>
<li>Parser: <a href="https://github.com/openjdk/nashorn/blob/main/src/org.openjdk.nashorn/share/classes/org/openjdk/nashorn/internal/parser/Parser.java">Hand-written</a></li>
<li>First release: <a href="https://blogs.oracle.com/nashorn/open-for-business">2012?</a></li>
<li>Notes: Replaced Rhino as default JavaScript implementation on JVM. Replaced by GraalJS more recently, but remains actively developed.</li>
</ul>
</li>
</ul>
<h4 id="embeddable">Embeddable</h4><ul>
<li><a href="https://github.com/nginx/njs">Nginx's njs</a><ul>
<li>Support: ES5</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/nginx/njs/blob/master/src/njs_vmcode.c">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/nginx/njs/blob/master/src/njs_parser.c">Hand-written</a></li>
</ul>
</li>
<li><a href="https://mp2.dk/techblog/chowjs/">ChowJS</a>: proprietary AOT compiler based on QuickJS for game developers<ul>
<li>Support: everything QuickJS does presumably (see further down for QuickJS)</li>
<li>Implementation language: C presumably</li>
<li>Runtime: QuickJS's bytecode interpreter but also an AOT compiler</li>
<li>Parser: QuickJS's presumably</li>
<li>First release: <a href="https://mp2.dk/techblog/chowjs/">2021</a></li>
<li>Notes: Code is not available so exact analysis on these points is not possible at the moment.</li>
</ul>
</li>
<li><a href="https://github.com/ccxvii/mujs">Artifex's mujs</a><ul>
<li>Support: ES5, probably</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/ccxvii/mujs/blob/master/jsrun.c">Bytecode stack-oriented VM</a></li>
<li>Parser: <a href="https://github.com/ccxvii/mujs/blob/master/jsparse.c">Hand-written</a></li>
<li>First release: <a href="https://github.com/ccxvii/mujs/releases/tag/1.0.0">2017?</a></li>
<li>Notes: Originally part of MuPDF viewer, but now broken out. Thanks to <a href="https://twitter.com/rwoodsmall">@rwoodsmalljs</a> for mentioning!</li>
</ul>
</li>
</ul>
<h4 id="embedded-systems">Embedded Systems</h4><ul>
<li><a href="https://github.com/Samsung/escargot">Samsung's Escargot</a><ul>
<li>Support: ES2020</li>
<li>Implementation language: C++</li>
<li>Runtime: <a href="https://github.com/Samsung/escargot/tree/master/src/interpreter">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/Samsung/escargot/tree/master/src/parser">Hand-written</a></li>
<li>First release: <a href="https://github.com/Samsung/escargot/graphs/contributors">2017?</a></li>
</ul>
</li>
<li><a href="https://github.com/espruino/Espruino">Espruino</a><ul>
<li>Support: parts of ES5, ES6, ES7/8</li>
<li>Implementation language: C</li>
<li>Runtime: Seems like direct recursive interpreting without an AST/intermediate form</li>
<li>Parser: <a href="https://github.com/espruino/Espruino/blob/master/src/jsparse.c">Hand-written</a></li>
<li>First release: <a href="https://github.com/espruino/Espruino/releases/tag/BEFORE_FUNCTION_REFACTOR">2012?</a></li>
</ul>
</li>
<li><a href="https://github.com/cesanta/elk">Cesanta's Elk</a><ul>
<li>Support: subset of ES6</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/cesanta/elk/blob/master/elk.c">Direct recursive interpreter without AST or bytecode VM</a></li>
<li>Parser: <a href="https://github.com/cesanta/elk/blob/master/elk.c">Hand-written</a></li>
<li>First release: <a href="https://github.com/cesanta/elk/releases/tag/0.0.1">2019?</a></li>
<li>Notes: It does all of this with a GC and FFI in <1400 lines of readable C code. Damn.</li>
</ul>
</li>
<li><a href="https://github.com/cesanta/mjs">Cesanta's mJS</a><ul>
<li>Support: subset of ES6</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/cesanta/mjs/blob/master/mjs.c#L3411">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/cesanta/mjs/blob/master/mjs.c#L12780">Hand-written</a></li>
<li>First release: <a href="https://github.com/cesanta/mjs/releases/tag/1.5">2017?</a></li>
</ul>
</li>
<li><a href="https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsSyntaxical.c">Moddable's XS</a><ul>
<li>Support: ES2018</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsRun.c">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsSyntaxical.c">Hand-written</a></li>
<li>First release: <a href="https://www.moddable.com/XS7-TC-39">2017?</a></li>
<li>Notes: More details at <a href="https://www.moddable.com/XS7-TC-39">https://www.moddable.com/XS7-TC-39</a> and <a href="https://www.moddable.com/faq#what-is-xs">https://www.moddable.com/faq#what-is-xs</a>.</li>
</ul>
</li>
</ul>
<h4 id="other">Other</h4><ul>
<li><a href="https://github.com/facebook/hermes">Facebook's Hermes</a><ul>
<li>Support: ES6 <a href="https://hermesengine.dev/docs/language-features">with exceptions</a></li>
<li>Implementation language: C++</li>
<li>Runtime: <a href="https://github.com/facebook/hermes/tree/main/lib/VM">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/facebook/hermes/blob/main/lib/Parser/JSParserImpl.cpp">Hand-written</a></li>
<li>First release: <a href="https://github.com/facebook/hermes/releases/tag/v0.0.1">2019?</a></li>
</ul>
</li>
<li><a href="https://github.com/qt/qtdeclarative/tree/dev/src/qml/jsruntime">Qt's V4</a><ul>
<li>Support: ES5</li>
<li>Implementation language: C++</li>
<li>Runtime: <a href="https://github.com/qt/qtdeclarative/blob/dev/src/qml/jsruntime/qv4vme_moth.cpp">Bytecode VM</a> and JIT compiler</li>
<li>Parser: <a href="https://github.com/qt/qtdeclarative/blob/dev/src/qml/parser/qqmljs.g">qlalr custom parser generator</a></li>
<li>First release: 2013</li>
<li>Notes: Unclear if can be run standalone outside of Qt.</li>
</ul>
</li>
</ul>
<p>I don't know whether to put Microsoft's ChakraCore into this list or
the next. So I'll put it here but note that as of this year 2021, they
are transitioning it to become a community-driven project.</p>
<ul>
<li><a href="https://github.com/chakra-core/ChakraCore">Microsoft's ChakraCore</a><ul>
<li>Support: ES6, probably more</li>
<li>Implementation language: C++</li>
<li>Runtime: <a href="https://github.com/chakra-core/ChakraCore/tree/master/lib/Backend">Bytecode VM and JIT on x86/ARM</a></li>
<li>Parser: <a href="https://github.com/chakra-core/ChakraCore/blob/master/lib/Parser/Parse.cpp">Hand-written</a></li>
<li>First release: 2015?</li>
</ul>
</li>
</ul>
<h3 id="mature,-community-driven">Mature, community-driven</h3><p>Implementations toward the top are more reliable and
proven. Implementations toward the bottom less so.</p>
<p>If you are a looking to get involved in language development, the
implementation further down on the list can be a great place to start
since they typically need work in documentation, testing, and language
features.</p>
<ul>
<li><a href="https://github.com/bellard/quickjs">Fabrice Bellard's QuickJS</a><ul>
<li>Support: ES2020</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://raw.githubusercontent.com/bellard/quickjs/master/quickjs.c">Bytecode VM</a> (this is a single large file)</li>
<li>Parser: <a href="https://raw.githubusercontent.com/bellard/quickjs/master/quickjs.c">Hand-written</a> (this is a single large file)</li>
<li>First release: <a href="https://github.com/bellard/quickjs/commit/91459fb6723e29e923380cec0023af93819ae69d#diff-ead07c84baac57a9542f388a07a2a5209456ce790b04251bc9bd7d179ea85cb1R84">2019</a></li>
</ul>
</li>
<li><a href="https://github.com/svaarala/duktape">DuktapeJS</a><ul>
<li>Support: ES5, some parts of ES6/ES7</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/svaarala/duktape/blob/master/src-input/duk_js_executor.c">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/svaarala/duktape/blob/master/src-input/duk_js_compiler.c">Hand-written</a>, notably with no AST. It just directly compiles to its own bytecode.</li>
<li>First release: <a href="https://duktape.org/download.html">2013</a></li>
</ul>
</li>
<li><a href="https://github.com/engine262/engine262">engine262</a><ul>
<li>Support: 100% spec compliance</li>
<li>Implementation language: JavaScript</li>
<li>Runtime: <a href="https://github.com/engine262/engine262/blob/14f50592362d889289e133fff4200e8e304c995a/src/runtime-semantics/IfStatement.mjs">AST interpreter</a></li>
<li>Parser: <a href="https://github.com/engine262/engine262/blob/main/src/parser/ExpressionParser.mjs">Hand-written</a></li>
</ul>
</li>
<li><a href="https://github.com/jerryscript-project/jerryscript">JerryScript</a><ul>
<li>Support: ES5</li>
<li>Implementation language: C</li>
<li>Runtime: <a href="https://github.com/jerryscript-project/jerryscript/blob/master/jerry-core/vm/vm.c">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/jerryscript-project/jerryscript/blob/master/jerry-core/parser/js/js-parser.c">Hand-written</a></li>
<li>First release: <a href="https://github.com/jerryscript-project/jerryscript/releases/tag/v1.0">2016?</a></li>
</ul>
</li>
<li><a href="https://github.com/SerenityOS/serenity/tree/master/Userland/Libraries/LibJS">Serenity's LibJS</a><ul>
<li>Support: <a href="https://libjs.dev/test262/">Progressing toward compliance</a></li>
<li>Implementation language: C++</li>
<li>Runtime: <a href="https://github.com/SerenityOS/serenity/tree/master/Userland/Libraries/LibJS/Bytecode">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/SerenityOS/serenity/blob/master/Userland/Libraries/LibJS/Parser.cpp">Hand-written</a></li>
<li>Notes: Might also work outside of Serenity but documentation on building/running it on Linux is hard to find.</li>
</ul>
</li>
<li><a href="https://github.com/dop251/goja">goja</a>: JavaScript interpreter for Go<ul>
<li>Support: ES5</li>
<li>Implementation language: Go</li>
<li>Runtime: <a href="https://github.com/dop251/goja/blob/master/vm.go">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/dop251/goja/blob/master/parser/statement.go">Hand-written</a></li>
<li>First release: <a href="https://github.com/dop251/goja/graphs/contributors">2017?</a></li>
</ul>
</li>
<li><a href="https://github.com/robertkrimen/otto">otto</a>: JavaScript interpreter for Go<ul>
<li>Support: ES5</li>
<li>Implementation language: Go</li>
<li>Runtime: <a href="https://github.com/robertkrimen/otto/blob/373ff54384526e8336b5b597619d0923a4a83ae0/cmpl_evaluate_expression.go#L183">AST interpreter</a></li>
<li>Parser: <a href="https://github.com/robertkrimen/otto/blob/master/parser/statement.go">Hand-written</a></li>
<li>First release: <a href="https://github.com/robertkrimen/otto/graphs/contributors">2012?</a></li>
<li>Notes: The AST interpreter-only implementation might suggest this implementation is slower than Goja. I don't have benchmarks for that.</li>
</ul>
</li>
<li><a href="https://github.com/paulbartrum/jurassic">Jurassic</a>: JavaScript parser and interpreter for .NET<ul>
<li>Support: ES5</li>
<li>Implementation language: C#</li>
<li>Runtime: Compiles to <a href="https://github.com/paulbartrum/jurassic/blob/ee6f4fa17e6205e15412a214b24d7575b0bd461c/Jurassic/Compiler/MethodGenerator/GlobalOrEvalMethodGenerator.cs#L139">.NET</a></li>
<li>Parser: <a href="https://github.com/paulbartrum/jurassic/blob/master/Jurassic/Compiler/Parser/Parser.cs">Hand-written</a></li>
<li>First release: <a href="https://github.com/paulbartrum/jurassic/graphs/contributors">2011?</a></li>
</ul>
</li>
<li><a href="https://github.com/sebastienros/jint">Jint</a><ul>
<li>Support: ES5, most of ES6/7/8</li>
<li>Implementation language: C#</li>
<li>Runtime: <a href="https://github.com/sebastienros/jint/blob/main/Jint/Runtime/Interpreter/Expressions/JintUnaryExpression.cs">AST interpreter</a></li>
<li>Parser: <a href="https://github.com/sebastienros/esprima-dotnet/blob/main/src/Esprima/JavascriptParser.cs">Hand-written via Esprima.NET</a></li>
<li>First release: <a href="https://github.com/sebastienros/jint/graphs/contributors">2014?</a></li>
<li>Notes: Thanks <a href="https://news.ycombinator.com/user?id=fowl2">fowl2</a> for mentioning!</li>
</ul>
</li>
<li><a href="https://github.com/nilproject/NiL.JS">NiL.JS</a><ul>
<li>Support: ES6</li>
<li>Implementation language: C#</li>
<li>Runtime: <a href="https://github.com/nilproject/NiL.JS/blob/develop/NiL.JS/Expressions/Assignment.cs">AST interpreter</a></li>
<li>Parser: <a href="https://github.com/nilproject/NiL.JS/blob/develop/NiL.JS/Core/Parser.cs">Hand-written</a></li>
<li>First release: <a href="https://github.com/nilproject/NiL.JS/graphs/contributors">2014?</a></li>
</ul>
</li>
<li><a href="https://github.com/NeilFraser/JS-Interpreter">Neil Fraser's JS-Interpreter</a><ul>
<li>Support: ES5</li>
<li>Implementation language: JavaScript</li>
<li>Runtime: <a href="https://github.com/NeilFraser/JS-Interpreter/blob/master/interpreter.js">AST interpreter</a></li>
<li>Parser: <a href="https://github.com/NeilFraser/JS-Interpreter/blob/master/acorn.js">Hand-written, uses Acorn</a></li>
<li>First release: <a href="https://github.com/NeilFraser/JS-Interpreter/graphs/contributors">2014?</a></li>
</ul>
</li>
<li><a href="https://github.com/BeRo1985/besen">BESEN</a>: Bytecode VM and JIT compiler in Object Pascal<ul>
<li>Support: ES5</li>
<li>Implementation language: Object Pascal</li>
<li>Runtime: <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENCode.pas">Bytecode VM</a> with <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENCodeJITx86.pas">JIT for x86</a> and <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENCodeJITx64.pas">x86_64</a></li>
<li>Parser: <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENParser.pas">Hand-written</a></li>
<li>First release: <a href="https://github.com/BeRo1985/besen/graphs/contributors">2015?</a></li>
</ul>
</li>
</ul>
<p>These last few are not toys but they are also more experimental or, in
AssemblyScript's case, not JavaScript.</p>
<ul>
<li><a href="https://github.com/boa-dev/boa">boa</a>: JavaScript interpreter for Rust<ul>
<li>Support: <a href="https://boa-dev.github.io/boa/test262/">Unclear</a></li>
<li>Implementation language: Rust</li>
<li>Runtime: <a href="https://github.com/boa-dev/boa/tree/master/boa/src/vm">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/boa-dev/boa/tree/master/boa/src/syntax/parser">Hand-written</a></li>
<li>First release: <a href="https://github.com/boa-dev/boa/releases/tag/v0.2.0">2019?</a></li>
</ul>
</li>
<li><a href="https://github.com/AssemblyScript/assemblyscript">AssemblyScript</a><ul>
<li>Support: Subset of TypeScript</li>
<li>Implementation language: AssemblyScript subset of TypeScript</li>
<li>Runtime: <a href="https://github.com/AssemblyScript/assemblyscript/blob/main/src/compiler.ts">webassembly</a></li>
<li>Parser: <a href="https://github.com/AssemblyScript/assemblyscript/blob/main/src/parser.ts">Hand-written</a></li>
</ul>
</li>
<li><a href="https://github.com/nickmain/kawa-scheme/tree/master/gnu/ecmascript">JavaScript in Kawa Scheme</a></li>
<li><a href="https://wingolog.org/archives/2009/02/22/ecmascript-for-guile">JavaScript in GNU Guile Scheme</a></li>
<li><a href="https://github.com/ReevaJS/reeva">ReevaJS</a><ul>
<li>Support: ES5 (with exceptions)</li>
<li>Implementation language: Kotlin</li>
<li>Runtime: <a href="https://github.com/ReevaJS/reeva/blob/master/src/main/kotlin/com/reevajs/reeva/interpreter/Interpreter.kt">Stack machine</a></li>
<li>Parser: <a href="https://github.com/ReevaJS/reeva/blob/master/src/main/kotlin/com/reevajs/reeva/parsing/Parser.kt">Hand-written</a></li>
</ul>
</li>
</ul>
<h3 id="research-implementations">Research Implementations</h3><ul>
<li><a href="https://github.com/higgsjs/Higgs">Higgs</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: D</li>
<li>Runtime: <a href="https://github.com/higgsjs/Higgs/blob/master/source/runtime/vm.d">Bytecode VM</a> and <a href="https://github.com/higgsjs/Higgs/tree/master/source/jit">JIT compiler on x64</a></li>
<li>Parser: <a href="https://github.com/higgsjs/Higgs/blob/master/source/parser/parser.d">Hand-written</a></li>
</ul>
</li>
<li><a href="https://github.com/tugawa/ejs-new">eJS</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: Java</li>
<li>Runtime: Bytecode VM</li>
<li>Parser: ANTLR</li>
<li>Notes: eJS is a framework to generate JavaScript VMs that are specialised for applications.</li>
</ul>
</li>
<li><a href="https://github.com/endojs/Jessie">Jessie</a>: safe subset of JavaScript non-exploitable smart contracts<ul>
<li>Support: some subset of ES2017</li>
<li>???</li>
<li>See <a href="https://github.com/agoric-labs/jessica">https://github.com/agoric-labs/jessica</a> for more info.</li>
</ul>
</li>
<li><a href="https://github.com/b9org/b9">https://github.com/b9org/b9</a></li>
<li><a href="https://www.defensivejs.com/">https://www.defensivejs.com/</a></li>
</ul>
<p class="note">
Thanks to <a href="https://twitter.com/smarr">@smarr</a> for contributing eJS, Higgs, and b9!
</p><h3 id="notable-abandoned">Notable Abandoned</h3><ul>
<li><a href="https://github.com/DigitalMars/DMDScript">DMDScript</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: D</li>
<li>Runtime: <a href="https://github.com/DigitalMars/DMDScript/blob/master/engine/source/dmdscript/opcodes.d#L15">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/DigitalMars/DMDScript/blob/master/engine/source/dmdscript/parse.d">Hand-written</a></li>
<li>Notes: It's possible this is commercially maintained by DigitalMars but I'm not sure. There are also references in this repo to another C++ implementation of DMDScript that may be commercial. Thanks to <a href="https://twitter.com/moon_chilled">@moon_chilled</a> for mentioning!</li>
</ul>
</li>
<li><a href="https://github.com/toshok/echojs">EchoJS</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: JavaScript</li>
<li>Runtime: Native through LLVM</li>
<li>Parser: <a href="https://github.com/toshok/esprima/tree/e4445c9cc2530d672c4e9f68f5e2a53673b57af0">Hand-written via Esprima</a></li>
</ul>
</li>
<li><a href="https://github.com/haileys/twostroke">twostroke</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: Ruby</li>
<li>Runtime: <a href="https://github.com/haileys/twostroke/blob/master/lib/twostroke/runtime/vm_frame.rb">Bytecode VM</a></li>
<li>Parser: <a href="https://github.com/haileys/twostroke/blob/master/lib/twostroke/parser.rb">Hand-written</a></li>
</ul>
</li>
<li><a href="https://github.com/progval/rpython-langjs">PyPy-JS</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: RPython</li>
<li>Runtime: <a href="https://github.com/progval/rpython-langjs/blob/master/js/jscode.py">RPython</a></li>
<li>Parser: <a href="https://github.com/progval/rpython-langjs/blob/master/js/jsgrammar.txt">EBNF parser generator</a></li>
</ul>
</li>
<li><a href="https://github.com/jterrace/js.js/">js.js</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: JavaScript</li>
<li>Runtime: Too scared to look at the gigantic files in this repo.</li>
<li>Parser: Ditto.</li>
</ul>
</li>
<li><a href="https://github.com/fholm/IronJS">IronJS</a><ul>
<li>Support: ES3</li>
<li>Implementation language: F#</li>
<li>Runtime: .NET through <a href="https://docs.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/dynamic-language-runtime-overview">DLR</a>, I think.</li>
<li>Parser: <a href="https://github.com/fholm/IronJS/blob/master/Src/IronJS/Compiler.Parser.fs">Hand-written</a></li>
</ul>
</li>
<li><a href="https://github.com/polydojo/jispy">jispy</a><ul>
<li>Support: Unclear</li>
<li>Implementation language: Python</li>
<li>Runtime: <a href="https://github.com/polydojo/jispy/blob/master/jispy.py#L730">AST interpreter</a></li>
<li>Parser: <a href="https://github.com/polydojo/jispy/blob/master/jispy.py#L311">Unclear</a></li>
</ul>
</li>
<li><a href="https://metacpan.org/pod/JE#Simple-Use">JE: Pure-Perl JavaScript Engine</a></li>
<li><a href="https://docs.racket-lang.org/javascript/index.html">Dave Herman's JavaScript for PLT Scheme</a></li>
</ul>
<h3 id="notable-toy-implementations">Notable toy implementations</h3><p>Great for inspiriration if you've never implemented a language before.</p>
<ul>
<li><a href="https://github.com/timruffles/js-to-c">js-to-c</a>: A JavaScript to C compiler, written in C</li>
<li><a href="https://github.com/mras0/mjs">mjs</a>: AST interpreter for not just ES5 or even ES3 but also ES1</li>
<li><a href="https://github.com/gojisvm/gojis">gojis</a>: AST interpreter in Go</li>
<li><a href="https://github.com/DelSkayn/toyjs">tojs</a>: Bytecode VM in Rust</li>
<li><a href="https://github.com/CrimsonAS/v2">v2</a>: Bytecode VM in Go</li>
<li><a href="https://github.com/githubyang/SparrowJS">SparrowJS</a>: AST interpreter in C++</li>
<li><a href="https://github.com/eatonphil/jsc">jsc</a>: My own experiment compiling JavaScript to C++/libV8</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New post is up! Enumerating and analyzing 40+ non-V8 JavaScript implementations; of course with links to source code and parser & runtime/backend decisions.<br><br>I hope you enjoy learning about JavaScript engines as much as I did. 😁<a href="https://t.co/dEX06WU38f">https://t.co/dEX06WU38f</a> <a href="https://t.co/AoYScphG6m">pic.twitter.com/AoYScphG6m</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1440436962305789952?ref_src=twsrc%5Etfw">September 21, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/javascript-implementations.htmlTue, 21 Sep 2021 00:00:00 +0000
- Writing a simple JSON library from scratch: a tour through modern C++http://notes.eatonphil.com/writing-a-simple-json-library-in-modern-cpp.html<p>Modern C++ has a lot of cool features. Move semantics means passing
around structs in functions is cheap. <code>std::shared_ptr</code>
means I don't have to manage any memory; no
more <code>new</code>/<code>delete</code>! (But try as I might to
understand <code>std::unique_ptr</code>, I'm just not there yet.)</p>
<p>The syntax has also gotten some treatment with <code>auto</code> and
tuple destructuring.</p>
<p>In order to test out this modern C++ I wanted a small but meaningful
project that operates on very dynamic data. The two that always come
to mind are JSON parsers or Lisp interpreters.</p>
<p>This post walks through
writing a basic JSON library from scratch using only the standard
library. The source code for the resulting library is available <a href="https://github.com/eatonphil/cpp-json">on
Github</a>.</p>
<p>The biggest simplification we'll make is that rather than full JSON
numbers, we'll only allow integers.</p>
<p class="note">
Big caveat! I couldn't be farther from a C++ expert! Email or tweet
me as you see mistakes, madness, lies.
</p><h3 id="api">API</h3><p>The two big parts of the API will be about lexing (turning a string
into an array of tokens) and parsing (turning an array of tokens into
a JSON object-tree). A better implementation would implement the lexer
as taking a character stream rather than a string, but taking a string
is simpler. So we'll stick with that.</p>
<p>Both of these functions can fail so we'll return a tuple in both cases
with a string containing a possibly blank error message.</p>
<p>We will define the header in <code>./include/json.hpp</code>.</p>
<div class="highlight"><pre><span></span><span class="cp">#ifndef JSON_H</span>
<span class="cp">#define JSON_H</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><tuple></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><vector></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string></span>
<span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span>
<span class="cp">#endif</span>
</pre></div>
<p>The token returned by <code>lex</code> will need to contain the
token's string value, the location (offset) in the original source, a
pointer to the full source (for debugging), and the token's type. The
token type itself will be an enum of either string, number, syntax
(colon, bracket, etc.), boolean, or null.</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><string></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><memory></span>
<span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span>
<span class="k">enum</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">JSONTokenType</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">Syntax</span><span class="p">,</span><span class="w"> </span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">Null</span><span class="w"> </span><span class="p">};</span>
<span class="k">struct</span><span class="w"> </span><span class="nc">JSONToken</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="n">JSONTokenType</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">full_source</span><span class="p">;</span>
<span class="p">};</span>
<span class="p">...</span>
<span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span>
<span class="p">...</span>
</pre></div>
<p>This is the only place in the entire code we'll pass around a
pointer. Using <code>std::shared_ptr</code> means we don't have to do
any manual memory management either. No <code>new</code> or
<code>delete</code>.</p>
<p>Next, <code>JSONValue</code> is a struct containing optional string,
boolean, number, array, and object fields with a type num to
differentiate.</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><map></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><optional></span>
<span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span>
<span class="k">enum</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">JSONValueType</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">Object</span><span class="p">,</span><span class="w"> </span><span class="n">Array</span><span class="p">,</span><span class="w"> </span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">Null</span><span class="w"> </span><span class="p">};</span>
<span class="k">struct</span><span class="w"> </span><span class="nc">JSONValue</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">string</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o"><</span><span class="kt">double</span><span class="o">></span><span class="w"> </span><span class="n">number</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o"><</span><span class="kt">bool</span><span class="o">></span><span class="w"> </span><span class="n">boolean</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONValue</span><span class="o">>></span><span class="w"> </span><span class="n">array</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">JSONValue</span><span class="o">>></span><span class="w"> </span><span class="n">object</span><span class="p">;</span>
<span class="w"> </span><span class="n">JSONValueType</span><span class="w"> </span><span class="n">type</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">enum</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">JSONTokenType</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">Syntax</span><span class="p">,</span><span class="w"> </span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">Null</span><span class="w"> </span><span class="p">};</span>
<span class="p">...</span>
</pre></div>
<p>Thanks to <code>std::optional</code> we can avoid using pointers to
describe these fields. I did take a look at <code>std::variant</code>
but it seemed like its API was overly complex.</p>
<p>Finally, we'll add two more functions: a high level <code>parse</code>
function that combines the job of lexing and parsing, and a
<code>deparse</code> function for printing a <code>JSONValue</code> as
a JSON string.</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">);</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">deparse</span><span class="p">(</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">);</span>
<span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span>
<span class="p">...</span>
</pre></div>
<p>Now we're ready to start on the implementation.</p>
<h3 id="lexing">Lexing</h3><p>First up is lexing; turning a JSON string into an array of tokens: a
number, string, null keyword, boolean keyword, or syntax like comma or
colon.</p>
<p>The main lex loop skips whitespace and calls helper functions for each
kind of token. If a token is found, we accumulate it and move to the
end of that token (some tokens like <code>:</code> are a single
character, some tokens like <code>"my great string"</code> are
multiple characters.)</p>
<p>Each token we find gets a pointer to the original JSON source for use
in error messages if parsing fails. Again this will be the only time
we explicitly pass around pointers in this implementation. We don't do
any manual management because we're going to use
<code>std::shared_ptr</code>.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"json.hpp"</span>
<span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="w"> </span><span class="n">tokens</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// All tokens will embed a pointer to the raw JSON for debugging purposes</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">original_copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="p">(</span><span class="n">raw_json</span><span class="p">);</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">generic_lexers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">lex_syntax</span><span class="p">,</span><span class="w"> </span><span class="n">lex_string</span><span class="p">,</span><span class="w"> </span><span class="n">lex_number</span><span class="p">,</span><span class="w"> </span><span class="n">lex_null</span><span class="p">,</span><span class="w"> </span><span class="n">lex_true</span><span class="p">,</span><span class="w"> </span><span class="n">lex_false</span><span class="p">};</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Skip past whitespace</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">new_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lex_whitespace</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">new_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">lexer</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">generic_lexers</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexer</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">new_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Error while lexing, return early</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">error</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Store reference to the original source</span>
<span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">full_source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_copy</span><span class="p">;</span>
<span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">token</span><span class="p">);</span>
<span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">found</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">format_error</span><span class="p">(</span><span class="s">"Unable to lex"</span><span class="p">,</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="p">}</span>
<span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span>
</pre></div>
<p>Two neat things you'll notice in there are tuple literal syntax
(<code>{tokens, ""}</code>) and how easy it is to type a value
containing an array of function pointers using auto
(<code>generic_lexers</code>).</p>
<h4 id="format_error">format_error</h4><p>Since we referenced <code>format_error</code>, let's define it. This
needs to accept a message prefix, the full JSON string, and the index
offset where the error should point to.</p>
<p>Inside the function we'll iterate over the string until we find the
entire line containing this index offset. We'll display that line and
a pointer to the character that is causing/starting the error.</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><sstream></span>
<span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">format_error</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">ostringstream</span><span class="w"> </span><span class="n">s</span><span class="p">;</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">counter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">source</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">counter</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">line</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\t'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">column</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">" "</span><span class="p">;</span>
<span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">" "</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">column</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">" "</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">counter</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Continue accumulating the lastline for debugging</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">counter</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">counter</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'\n'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">counter</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">" at line "</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">", column "</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"^"</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">str</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">...</span>
</pre></div>
<p>The <code>printf</code> API is annoying and Clang 12 (latest Clang on
latest Fedora) doesn't seem to support <code>std::format</code>. So we
just use
<code>std::sstream</code> to do string "formatting".</p>
<p>But ok, back to lexing! Next up: whitespace.</p>
<h4 id="lex_whitespace">lex_whitespace</h4><p>This function's job is to skip past whitespace. Thankfully we've got
<code>std::isspace</code> to help.</p>
<div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">lex_whitespace</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">isspace</span><span class="p">(</span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">index</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>It's very simple!</p>
<h4 id="lex_syntax">lex_syntax</h4><p>All of the generic lexers follow the same pattern. They return either
a valid token and the index where the token ends, or they return an
error string.</p>
<p>Since all the syntax elements in JSON (<code>,</code>, <code>:</code>,
<code>{</code>, <code>}</code>, <code>[</code> and , <code>]</code>)
are single characters, we don't need to write a "longest substring"
helper function. We simply check if the current character is one of
these characters and return a syntax token if so.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_syntax</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">{</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'['</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">']'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'{'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'}'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">':'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">','</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<h3 id="lex_string">lex_string</h3><p>This one manages state so it's a little more complex. We need to check
if the current character is a double quote, then iterate over
characters until we find the ending quote.</p>
<p>It's possible to hit EOF here so we need to handle that case. And
handling nested quotes is left as an exercise for the reader. :)</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_string</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">original_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">{</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'"'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">original_index</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// TODO: handle nested quotes</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">],</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">'"'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">format_error</span><span class="p">(</span><span class="s">"Unexpected EOF while lexing string"</span><span class="p">,</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>Nothing too special to discuss here. So on to lexing numbers.</p>
<h3 id="lex_number">lex_number</h3><p>Since we're only supporting integers, this one has no internal
state. We check characters until we stop seeing digits.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_number</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">original_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">""</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// TODO: handle not just integers</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="nb">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'0'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'9'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>Done. On to keywords: <code>null</code>, <code>false</code>, <code>true</code>.</p>
<h3 id="lex_keyword">lex_keyword</h3><p>This is a helper function that will check for a literal keyword.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">keyword</span><span class="p">,</span>
<span class="w"> </span><span class="n">JSONTokenType</span><span class="w"> </span><span class="n">type</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">original_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">{</span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">keyword</span><span class="p">[</span><span class="n">index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">original_index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">original_index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">keyword</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">keyword</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>With this defined we can now implement <code>lex_false</code>,
<code>lex_true</code>, and <code>lex_null</code>.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_null</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="s">"null"</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Null</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_true</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="s">"true"</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">lex_false</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="s">"false"</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for lexing! And although we defined all of these
top-down, you'll want to write them mostly in reverse order or put in
forward declarations.</p>
<p>If you wanted to you could now write a simple <code>main.cpp</code>
like:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"json.hpp"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><iostream></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"Expected JSON input argument to parse"</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">in</span><span class="p">{</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]};</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">lex</span><span class="p">(</span><span class="n">in</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">tokens</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Set up a Makefile:</p>
<div class="highlight"><pre><span></span><span class="nf">main</span><span class="o">:</span><span class="w"> </span>*.<span class="n">cpp</span> ./<span class="n">include</span>/*.<span class="n">hpp</span>
<span class="w"> </span>clang++<span class="w"> </span>-g<span class="w"> </span>-Wall<span class="w"> </span>-std<span class="o">=</span>c++2a<span class="w"> </span>-I./include<span class="w"> </span>*.cpp<span class="w"> </span>-o<span class="w"> </span><span class="nv">$@</span>
</pre></div>
<p>Build with <code>make</code> and run <code>./main '{"a": 1}'</code>
to see the list of tokens printed out.</p>
<p>Now let's move on to parsing from the array of tokens.</p>
<h3 id="parsing">Parsing</h3><p>This process takes the array of tokens and turns them into a tree
structure. The tree develops children as we spot <code>[</code> or
<code>{</code> tokens. The tree child ends when we spot <code>]</code>
or <code>}</code> tokens.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span>
<span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Number</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">stod</span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Number</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Boolean</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">boolean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"true"</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Boolean</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Null</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Null</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">String</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">string</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">String</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Syntax</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"["</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">array</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">array</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Array</span><span class="p">},</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"{"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">object</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">object</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="p">(</span><span class="n">object</span><span class="p">),</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Object</span><span class="p">},</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">"Failed to parse"</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">)};</span>
<span class="p">}</span>
</pre></div>
<p>This in turn reference <code>format_parse_error</code> on failure
which is an error-string-maker similar to
<code>format_error</code>. It actually calls <code>format_error</code>
with more details specific to parsing.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">JSONTokenType_to_string</span><span class="p">(</span><span class="n">JSONTokenType</span><span class="w"> </span><span class="n">jtt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">jtt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">String</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"String"</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Number</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Number"</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Syntax</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Syntax"</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Boolean</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Boolean"</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Null</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"Null"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">format_parse_error</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">ostringstream</span><span class="w"> </span><span class="n">s</span><span class="p">;</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"Unexpected token '"</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"', type '"</span>
<span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">JSONTokenType_to_string</span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"', index "</span><span class="p">;</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">base</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">format_error</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">str</span><span class="p">(),</span><span class="w"> </span><span class="o">*</span><span class="n">token</span><span class="p">.</span><span class="n">full_source</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">location</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p class="note">
This function depended on a helper for turning the
<code>JSONTokenType</code> enum into a string. As a user it's very
annoying when langauges doesn't give you stringifier methods for enums
by default for debugging. I know there's some ways to do this with
reflection in C++ but it seemed hairy.
But I digest.
</p><h4 id="parse_array">parse_array</h4><p>This function was called by <code>parse</code> when we found an
opening bracket. This function needs to recursively call parse and
then check for a comma and call parse again ... until it finds the
closing bracket.</p>
<p>It will fail if it every finds something other than a comma or closing
bracket following a succesful call to <code>parse</code>.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONValue</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span>
<span class="n">parse_array</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONValue</span><span class="o">></span><span class="w"> </span><span class="n">children</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"]"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">children</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">","</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">children</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span>
<span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">"Expected comma after element in array"</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">child</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">children</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">child</span><span class="p">);</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">"Unexpected EOF while parsing array"</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">])};</span>
<span class="p">}</span>
</pre></div>
<p>And finally we need to implement <code>parse_object</code>.</p>
<h4 id="parse_object">parse_object</h4><p>This function is similar to <code>parse_array</code> but it needs to
find <code>$string COLON $parse() COMMA</code> pattern pairs.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">JSONValue</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span>
<span class="n">parse_object</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">JSONToken</span><span class="o">></span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">JSONValue</span><span class="o">></span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"}"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">values</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">","</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">values</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">"Expected comma after element in object"</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span>
<span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span>
<span class="w"> </span><span class="s">"Expected key-value pair or closing brace in object"</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">"Expected string key in object"</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">":"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span>
<span class="w"> </span><span class="n">index</span><span class="p">,</span>
<span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">"Expected colon after key in object"</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">new_index1</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error1</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">values</span><span class="p">[</span><span class="n">key</span><span class="p">.</span><span class="n">string</span><span class="p">.</span><span class="n">value</span><span class="p">()]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span>
<span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">values</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>These parse functions are all slightly tedious but still very
simple. And thankfully, we're done!</p>
<p>We can now implement the variation of <code>parse</code> that ties
together lexing and parsing.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">></span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">source</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">lex</span><span class="p">(</span><span class="n">source</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">error</span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>And we're completely done the string to <code>JSONValue</code> code.</p>
<h3 id="deparse">deparse</h3><p>The very last piece of the implementation is to do the reverse of the
past operations: generate a string from a <code>JSONValue</code>.</p>
<p>This is a recursive function and the only mildly tricky part is
deciding how to do whitespace if we want a prettier output.</p>
<div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">deparse</span><span class="p">(</span><span class="n">JSONValue</span><span class="w"> </span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">whitespace</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">String</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"</span><span class="se">\"</span><span class="s">"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">string</span><span class="p">.</span><span class="n">value</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"</span><span class="se">\"</span><span class="s">"</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Boolean</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">boolean</span><span class="p">.</span><span class="n">value</span><span class="p">()</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s">"true"</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s">"false"</span><span class="p">);</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Number</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">to_string</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">number</span><span class="p">.</span><span class="n">value</span><span class="p">());</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Null</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">"null"</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Array</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"[</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">array</span><span class="p">.</span><span class="n">value</span><span class="p">();</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">size</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">" "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">deparse</span><span class="p">(</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">" "</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">","</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"]"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Object</span><span class="p">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"{</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">object</span><span class="p">.</span><span class="n">value</span><span class="p">();</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="o">&</span><span class="p">[</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">]</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">values</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">" "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"</span><span class="se">\"</span><span class="s">"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="s">"</span><span class="se">\"</span><span class="s">: "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">deparse</span><span class="p">(</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">" "</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">values</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">","</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">;</span>
<span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"}"</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Done. Done. Done.</p>
<h3 id="main.cpp">main.cpp</h3><p>This program will simply accept a JSON input, parse it, and pretty
print it right back out. Kind of like a simplified <code>jq</code>.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"json.hpp"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><iostream></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="s">"Expected JSON input argument to parse"</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">in</span><span class="p">{</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]};</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">parse</span><span class="p">(</span><span class="n">in</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">deparse</span><span class="p">(</span><span class="n">ast</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Build it with <code>make</code> that we already defined, and run it against
something big like
<a href="https://github.com/eatonphil/cpp-json/blob/main/test/glossary.json">this</a>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>cpp-json
$<span class="w"> </span>make
$<span class="w"> </span>./main<span class="w"> </span><span class="s2">"</span><span class="k">$(</span>cat<span class="w"> </span>./test/glossary.json<span class="k">)</span><span class="s2">"</span>
<span class="o">{</span>
<span class="w"> </span><span class="s2">"glossary"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"GlossDiv"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"GlossList"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"GlossEntry"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"Abbrev"</span>:<span class="w"> </span><span class="s2">"ISO 8879:1986"</span>,
<span class="w"> </span><span class="s2">"Acronym"</span>:<span class="w"> </span><span class="s2">"SGML"</span>,
<span class="w"> </span><span class="s2">"GlossDef"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"GlossSeeAlso"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="s2">"GML"</span>,
<span class="w"> </span><span class="s2">"XML"</span>
<span class="w"> </span><span class="o">]</span>,
<span class="w"> </span><span class="s2">"para"</span>:<span class="w"> </span><span class="s2">"A meta-markup language, used to create markup languages such as DocBook."</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"GlossSee"</span>:<span class="w"> </span><span class="s2">"markup"</span>,
<span class="w"> </span><span class="s2">"GlossTerm"</span>:<span class="w"> </span><span class="s2">"Standard Generalized Markup Language"</span>,
<span class="w"> </span><span class="s2">"ID"</span>:<span class="w"> </span><span class="s2">"SGML"</span>,
<span class="w"> </span><span class="s2">"SortAs"</span>:<span class="w"> </span><span class="s2">"SGML"</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"title"</span>:<span class="w"> </span><span class="s2">"S"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"title"</span>:<span class="w"> </span><span class="s2">"example glossary"</span>
<span class="w"> </span><span class="o">}</span>
<span class="o">}</span>
</pre></div>
<p>Or something incorrect like:</p>
<div class="highlight"><pre><span></span>./main<span class="w"> </span><span class="s1">'{"foo": [{ 1: 2 }]}'</span>
Unexpected<span class="w"> </span>token<span class="w"> </span><span class="s1">'1'</span>,<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s1">'Number'</span>,<span class="w"> </span>index
Expected<span class="w"> </span>string<span class="w"> </span>key<span class="w"> </span><span class="k">in</span><span class="w"> </span>object<span class="w"> </span>at<span class="w"> </span>line<span class="w"> </span><span class="m">1</span>,<span class="w"> </span>column<span class="w"> </span><span class="m">11</span>
<span class="o">{</span><span class="s2">"foo"</span>:<span class="w"> </span><span class="o">[{</span><span class="w"> </span><span class="m">1</span>:<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">}]}</span>
<span class="w"> </span>^
</pre></div>
<p>And give Valgrind the old try:</p>
<div class="highlight"><pre><span></span>valgrind<span class="w"> </span>./main<span class="w"> </span><span class="s1">'{"a": [1, 2, null, { "c": 129 }]}'</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Memcheck,<span class="w"> </span>a<span class="w"> </span>memory<span class="w"> </span>error<span class="w"> </span><span class="nv">detector</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Copyright<span class="w"> </span><span class="o">(</span>C<span class="o">)</span><span class="w"> </span><span class="m">2002</span>-2017,<span class="w"> </span>and<span class="w"> </span>GNU<span class="w"> </span>GPL<span class="err">'</span>d,<span class="w"> </span>by<span class="w"> </span>Julian<span class="w"> </span>Seward<span class="w"> </span>et<span class="w"> </span>al.
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Using<span class="w"> </span>Valgrind-3.17.0<span class="w"> </span>and<span class="w"> </span>LibVEX<span class="p">;</span><span class="w"> </span>rerun<span class="w"> </span>with<span class="w"> </span>-h<span class="w"> </span><span class="k">for</span><span class="w"> </span>copyright<span class="w"> </span><span class="nv">info</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Command:<span class="w"> </span>./main<span class="w"> </span><span class="o">{</span><span class="s2">"a"</span>:<span class="se">\ </span><span class="o">[</span><span class="m">1</span>,<span class="se">\ </span><span class="m">2</span>,<span class="se">\ </span>null,<span class="se">\ </span><span class="o">{</span><span class="se">\ </span><span class="s2">"c"</span>:<span class="se">\ </span><span class="m">129</span><span class="se">\ </span><span class="o">}]}</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span>
<span class="o">{</span>
<span class="w"> </span><span class="s2">"a"</span>:<span class="w"> </span><span class="o">[</span>
<span class="w"> </span><span class="m">1</span>.000000,
<span class="w"> </span><span class="m">2</span>.000000,
<span class="w"> </span>null,
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"c"</span>:<span class="w"> </span><span class="m">129</span>.000000
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">]</span>
<span class="o">}==</span><span class="nv">153027</span><span class="o">==</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>HEAP<span class="w"> </span>SUMMARY:
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>use<span class="w"> </span>at<span class="w"> </span>exit:<span class="w"> </span><span class="m">0</span><span class="w"> </span>bytes<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="nv">blocks</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>total<span class="w"> </span>heap<span class="w"> </span>usage:<span class="w"> </span><span class="m">128</span><span class="w"> </span>allocs,<span class="w"> </span><span class="m">128</span><span class="w"> </span>frees,<span class="w"> </span><span class="m">105</span>,386<span class="w"> </span>bytes<span class="w"> </span><span class="nv">allocated</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>All<span class="w"> </span>heap<span class="w"> </span>blocks<span class="w"> </span>were<span class="w"> </span>freed<span class="w"> </span>--<span class="w"> </span>no<span class="w"> </span>leaks<span class="w"> </span>are<span class="w"> </span><span class="nv">possible</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span>
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>For<span class="w"> </span>lists<span class="w"> </span>of<span class="w"> </span>detected<span class="w"> </span>and<span class="w"> </span>suppressed<span class="w"> </span>errors,<span class="w"> </span>rerun<span class="w"> </span>with:<span class="w"> </span>-s
<span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>ERROR<span class="w"> </span>SUMMARY:<span class="w"> </span><span class="m">0</span><span class="w"> </span>errors<span class="w"> </span>from<span class="w"> </span><span class="m">0</span><span class="w"> </span>contexts<span class="w"> </span><span class="o">(</span>suppressed:<span class="w"> </span><span class="m">0</span><span class="w"> </span>from<span class="w"> </span><span class="m">0</span><span class="o">)</span>
</pre></div>
<p>Pretty sweet. Modern C++, I like it!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I don't do a lot of C++ so I wanted to get a sense for what it can look like today.<br><br>This post walks through a number of new-ish C++ features as we build a handwritten recursive descent parser for JSON using only the standard library.<a href="https://t.co/cCN6nP0pDi">https://t.co/cCN6nP0pDi</a> <a href="https://t.co/0AZNEZv4Ss">pic.twitter.com/0AZNEZv4Ss</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1431000902710796292?ref_src=twsrc%5Etfw">August 26, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/writing-a-simple-json-library-in-modern-cpp.htmlThu, 26 Aug 2021 00:00:00 +0000
- Parser generators vs. handwritten parsers: surveying major language implementations in 2021http://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.html<p>Developers often think parser generators are the sole legit way to
build programming language frontends, possibly because compiler
courses in university teach lex/yacc variants. But do any modern
programming languages actually use parser generators anymore?</p>
<p>To find out, this post presents a non-definitive survey of the parsing
techniques used by various major programming language implementations.</p>
<h3 id="cpython:-peg-parser">CPython: PEG parser</h3><p>Until CPython 3.10 (which hasn't been released yet) the default parser
was built using <a href="https://www.python.org/dev/peps/pep-0269/">pgen</a>, a
custom parser generator. The team thought the PEG parser was a <a href="https://www.python.org/dev/peps/pep-0617/">better
fit for expressing the
language</a>. At the time the
switch from pgen to PEG parser improved speed 10% but increased memory
usage by 10% as well.</p>
<p>The PEG grammar is defined
<a href="https://github.com/python/cpython/blob/v3.9.6/Grammar/python.gram">here</a>. (It
is getting renamed in 3.10 though so check the directory for a file of
a similar name if you browse 3.10+).</p>
<p class="note">
This section was corrected
by <a href="https://www.reddit.com/r/ProgrammingLanguages/comments/p8vvcs/parser_generators_vs_handwritten_parsers/h9tbuve/?utm_source=reddit&utm_medium=web2x&context=3">MegaIng</a> on Reddit. Originally
I mistakenly claimed the previous parser was
handwritten. It was not.
<br /><br />
Thanks <a href="https://twitter.com/jryans">J. Ryan Stinnett</a> for
a correction about the change in speed in the new PEG parser.
</p><h3 id="gcc:-handwritten">GCC: Handwritten</h3><p>Source code for the C parser available
<a href="https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.1.0/gcc/c/c-parser.cc">here</a>. It
used to use Bison until <a href="https://gcc.gnu.org/gcc-4.1/changes.html">GCC 4.1 in
2006</a>. The C++ parser also
switched from Bison to a handwritten parser <a href="https://gcc.gnu.org/gcc-3.4/changes.html">2 years
earlier</a>.</p>
<h3 id="clang:-handwritten">Clang: Handwritten</h3><p>Not only handwritten but the same <em>file</em> handles parsing C,
Objective-C and C++. Source code is available
<a href="https://github.com/llvm/llvm-project/blob/llvmorg-12.0.1/clang/lib/Parse/Parser.cpp">here</a>.</p>
<h3 id="ruby:-yacc-like-parser-generator">Ruby: Yacc-like Parser Generator</h3><p>Ruby uses Bison. The
grammar for the language can be found
<a href="https://github.com/ruby/ruby/blob/v3_0_2/parse.y">here</a>.</p>
<h3 id="v8-javascript:-handwritten">V8 JavaScript: Handwritten</h3><p>Source code available <a href="https://github.com/v8/v8/blob/9.5.38/src/parsing/parser.cc">here</a>.</p>
<h3 id="zend-engine-php:-yacc-like-parser-generator">Zend Engine PHP: Yacc-like Parser Generator</h3><p>Source code available <a href="https://github.com/php/php-src/blob/php-8.0.9/Zend/zend_language_parser.y">here</a>.</p>
<h3 id="typescript:-handwritten">TypeScript: Handwritten</h3><p>Source code available <a href="https://github.com/microsoft/TypeScript/blob/v4.3.5/src/compiler/parser.ts">here</a>.</p>
<h3 id="bash:-yacc-like-parser-generator">Bash: Yacc-like Parser Generator</h3><p>Source code for the grammar is available
<a href="http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y?h=bash-5.1">here</a>.</p>
<h3 id="chromium-css-parser:-handwritten">Chromium CSS Parser: Handwritten</h3><p>Source code available <a href="https://github.com/chromium/chromium/blob/95.0.4617.2/third_party/blink/renderer/core/css/parser/css_parser_impl.cc">here</a>.</p>
<h3 id="java-(openjdk):-handwritten">Java (OpenJDK): Handwritten</h3><p>You can find the source code
<a href="https://github.com/openjdk/jdk/blob/jdk-18%2B11/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java">here</a>.</p>
<p>Some <a href="https://openjdk.java.net/projects/compiler-grammar/">older
commentary</a> calls
this implementation fragile. But a Java contributor <a href="https://twitter.com/BrianGoetz/status/1429227723055042568">suggests the
situation has improved since Java
8</a>.</p>
<h3 id="golang:-handwritten">Golang: Handwritten</h3><p>Until Go 1.6 the compiler used a yacc-based parser. The source code
for that grammar is available
<a href="https://github.com/golang/go/blob/go1.5/src/cmd/compile/internal/gc/y.go">here</a>.</p>
<p>In Go 1.6 they switched to a handwritten parser. You can find that
change <a href="https://go-review.googlesource.com/c/go/+/16665/">here</a>. There
was a reported 18% speed increase when parsing files and a reported 3%
speed increase in building the compiler itself when switching.</p>
<p>You can find the source code for the compiler's parser
<a href="https://github.com/golang/go/blob/go1.17/src/cmd/compile/internal/syntax/parser.go">here</a>.</p>
<h3 id="roslyn:-handwritten">Roslyn: Handwritten</h3><p>The C# parser source code is available
<a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2019-Version-16.11/src/Compilers/CSharp/Portable/Parser/LanguageParser.cs">here</a>. The
Visual Basic parser source code is
<a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2019-Version-16.11/src/Compilers/VisualBasic/Portable/Parser/Parser.vb">here</a>.</p>
<p>A C# contributor mentioned a few key reasons for using a handwritten parser <a href="https://news.ycombinator.com/item?id=13915150">here</a>.</p>
<h3 id="lua:-handwritten">Lua: Handwritten</h3><p>Source code available <a href="https://github.com/lua/lua/blob/v5.4.3/lparser.c">here</a>.</p>
<h3 id="swift:-handwritten">Swift: Handwritten</h3><p>Source code available <a href="https://github.com/apple/swift/blob/swift-5.4.2-RELEASE/lib/Parse/Parser.cpp">here</a>.</p>
<h3 id="r:-yacc-like-parser-generator">R: Yacc-like Parser Generator</h3><p>I couldn't find it at first but
<a href="https://www.reddit.com/r/programming/comments/p8vv1l/parser_generators_vs_handwritten_parsers/h9tl763/?utm_source=reddit&utm_medium=web2x&context=3">Liorithiel</a>
showed me the parser source code is
<a href="https://github.com/wch/r-source/blob/trunk/src/main/gram.y">here</a>.</p>
<h3 id="julia:-handwritten-...-in-scheme">Julia: Handwritten ... in Scheme</h3><p>Julia's parser is handwritten but not in Julia. It's in Scheme! Source code available <a href="https://github.com/JuliaLang/julia/blob/v1.6.2/src/julia-parser.scm">here</a>.</p>
<h3 id="postgresql:-yacc-like-parser-generator">PostgreSQL: Yacc-like Parser Generator</h3><p>PostgreSQL uses Bison for parsing queries. Source code for the grammar
available
<a href="https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/parser/gram.y">here</a>.</p>
<h3 id="mysql:-yacc-parser-generator">MySQL: Yacc Parser Generator</h3><p>Source code for the grammar available
<a href="https://github.com/mysql/mysql-server/blob/8.0/sql/sql_yacc.yy">here</a>.</p>
<h3 id="sqlite:-yacc-like-parser-generator">SQLite: Yacc-like Parser Generator</h3><p>SQLite uses its own parser generator called
<a href="https://www.sqlite.org/lemon.html">Lemon</a>. Source code for the
grammary is available
<a href="https://github.com/sqlite/sqlite/blob/version-3.36.0/src/parse.y">here</a>.</p>
<h3 id="summary">Summary</h3><p>Of the <a href="https://redmonk.com/sogrady/2021/03/01/language-rankings-1-21/">2021 Redmonk top 10
languages</a>,
8 of them have a handwritten parser. Ruby and Python use parser generators.</p>
<p>Although parser generators are still used in major language
implementations, maybe it's time for universities to start teaching
handwritten parsing?</p>
<p class="note">
This tweet was published before I was corrected about Python's
parser. It should say 8/10 but I cannot edit the tweet.
</p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Let's actually survey the parsing techniques used by major programming languages in 2021 (with links to code 👾).<br><br>In this post we discover that 9/10 of the top languages by <a href="https://twitter.com/redmonk?ref_src=twsrc%5Etfw">@redmonk</a> use a handwritten parser as opposed to a parser generator. 😱<a href="https://t.co/M69TqN78G5">https://t.co/M69TqN78G5</a> <a href="https://t.co/sGsdDmwshB">pic.twitter.com/sGsdDmwshB</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1429137493019045899?ref_src=twsrc%5Etfw">August 21, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.htmlSat, 21 Aug 2021 00:00:00 +0000
- Practical? Common Lisp on the JVM: A quick intro to ABCL for modern web appshttp://notes.eatonphil.com/practical-common-lisp-on-the-jvm.html<p>In a ridiculous attempt to <a href="https://news.ycombinator.com/item?id=28036679">prove an internet
wrong</a> about the
practicality of Lisp (Common Lisp specifically), I tried to get a
simple (but realistic) web app running. After four days and <a href="https://github.com/armedbear/abcl/pull/379">a patch
to ABCL</a> I got something
working.</p>
<p>The code I had in mind would look something like this:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">port</span><span class="w"> </span><span class="mi">8080</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">server</span><span class="w"> </span><span class="p">(</span><span class="nv">make-server</span><span class="w"> </span><span class="nv">port</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="s">"GET"</span><span class="w"> </span><span class="s">"/"</span><span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="s">"My index!"</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="s">"GET"</span><span class="w"> </span><span class="s">"/search"</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">template</span><span class="w"> </span><span class="s">"search.tmpl"</span><span class="w"> </span><span class="o">'</span><span class="p">((</span><span class="s">"version"</span><span class="w"> </span><span class="s">"0.1.0"</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="s">"results"</span><span class="w"> </span><span class="p">(</span><span class="s">"cat"</span><span class="w"> </span><span class="s">"dog"</span><span class="w"> </span><span class="s">"mouse"</span><span class="p">)))))))</span>
</pre></div>
<p>And <code>search.tmpl</code> would be some Jinja-like text file:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">title</span><span class="p">></span>Version {{ version }}<span class="p"></</span><span class="nt">title</span><span class="p">></span>
{% for item in results %}
<span class="p"><</span><span class="nt">h2</span><span class="p">></span>{{ item }}<span class="p"></</span><span class="nt">h2</span><span class="p">></span>
{% endfor %}
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</pre></div>
<p>The source code for this post can be found <a href="https://github.com/eatonphil/jvm-lisp-examples">on Github</a>.</p>
<h3 id="picking-a-language,-libraries">Picking a language, libraries</h3><p><a href="https://abcl.org">Armed Bear Common Lisp</a> (ABCL) is the only Common Lisp
implementation I'm aware of that can hook into a major ecosystem of
libraries like the JVM or CLR has. In theory, this makes it a safe
suggestion for folks who want the stability and resources of the
ecosystem even if they aren't using its flagship language.</p>
<p>I wanted to use some micro web framework like
<a href="https://sparkjava.com/">Spark</a> or <a href="https://micronaut.io/">Micronaut</a>.</p>
<p>The problem with libraries like Micronaut (and
<a href="https://eclipse-ee4j.github.io/jersey/">Jersey</a>) is that they do a
lot of dynamic inspection to figure out how to register controllers
and whatnot. This is certainly convenient for developers using the
library in Java. But it becomes an ordeal when you're trying to use
the library through a foreign function interface (FFI) in another
language. An example of this is if a framework scans all files in a
directory for a <code> @GET</code> annotation.</p>
<p>On the other hand, Spark had a seeming hard-requirement about bringing
in a Websocket library which caused some issues during
configuration. So I ended up going with <a href="https://jooby.io/">Jooby</a> and
<a href="https://netty.io/">Netty</a> (as the underlying server).</p>
<p>Finally, I looked into a few Jinja-like template libraries and settled
on <a href="https://pebbletemplates.io/">Pebble</a> since
<a href="https://github.com/HubSpot/jinjava">Jinjava</a> <a href="https://github.com/HubSpot/jinjava/issues/317">wouldn't load for
me</a>.</p>
<h3 id="3rd-party-jars-and-foreign-function-calls">3rd-party jars and foreign function calls</h3><p>So you've got your maven dependencies and ran <code>mvn
install</code>. Your <code>pom.xml</code> looks like this:</p>
<div class="highlight"><pre><span></span><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><project></span>
<span class="w"> </span><span class="nt"><modelVersion></span>4.0.0<span class="nt"></modelVersion></span>
<span class="w"> </span><span class="nt"><groupId></span>com.github.eatonphil<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>abcl-rest-api-hello-world<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><dependencies></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>io.jooby<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jooby<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.10.0<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>io.jooby<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jooby-netty<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.10.0<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>io.pebbletemplates<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>pebble<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>3.1.5<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"></dependencies></span>
<span class="nt"></project></span>
</pre></div>
<p>ABCL has a package called <code>abcl-asdf</code> that helps you resolve dependencies through Maven and your filesystem. We'll import it and a package it depends on (<code>abcl-contrib</code>):</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">require</span><span class="w"> </span><span class="ss">:abcl-contrib</span><span class="p">)</span>
<span class="p">(</span><span class="nb">require</span><span class="w"> </span><span class="ss">:abcl-asdf</span><span class="p">)</span>
</pre></div>
<p>All our code will go into a single <code>main.lisp</code> file.</p>
<p>To import a specific package from Maven you
call <code>abcl-asdf:resolve</code> with a colon-separated string
containing the Maven package group id and artifact id. Then you pass
that result to <code>abcl-asdf:as-classpath</code> and pass that
result to <code>java:add-to-classpath</code>.</p>
<p>It will look like this:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">setf</span><span class="w"> </span><span class="nv">imports</span><span class="w"> </span><span class="o">'</span><span class="p">(</span><span class="s">"io.jooby:jooby"</span>
<span class="w"> </span><span class="s">"io.jooby:jooby-netty"</span>
<span class="w"> </span><span class="s">"io.pebbletemplates:pebble"</span><span class="p">))</span>
<span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nb">import</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">imports</span>
<span class="w"> </span><span class="nb">do</span><span class="w"> </span><span class="p">(</span><span class="nv">java:add-to-classpath</span>
<span class="w"> </span><span class="p">(</span><span class="nv">abcl-asdf:as-classpath</span><span class="w"> </span><span class="p">(</span><span class="nv">abcl-asdf:resolve</span><span class="w"> </span><span class="nb">import</span><span class="p">))))</span>
</pre></div>
<p>Now you can call functions within these packages. If you want to call
a Java method using only builtins it looks like <code>(jcall "method"
"com.organization.package.Class" object arg1 arg2 ... argN)</code>. If
you want to call a static Java method you use <code>(jstatic
...)</code> instead of <code>(jcall ...)</code>.</p>
<p>It seems that ABCL will automatically convert simple types from their
Lisp representation to Java but it will not turn a list into an
array. If a Java function requires an array you'll have to do that
explicitly with a function like <code>(java:jnew-array-from-list
"java.lang.String" my-string-list)</code>.</p>
<p>When using the builtin Java FFI you always need to use the fully
qualified name for classes like <code>java.lang.Object</code>
for <code>Object</code> or <code>java.util.Array</code>
for <code>Array</code>.</p>
<p>Alternatively you can <code>(require :jss)</code> to get access to a
simpler syntax for making Java calls. A method call looks
like <code>(#"method" object arg1 arg2 ... argN)</code>. Creating a
new instance of an object is calling <code>(jss:jnew
'className)</code>. When you use JSS you don't need to fully qualify a
class name unless there are more than one class with the same
name. For example to create a new Jooby application instance we can
call <code>(jss:jnew 'Jooby)</code>. As long as the class can be found
in the class path JSS will resolve it.</p>
<h3 id="some-real-code">Some real code</h3><p>The real code will look similar to the pseudo-code at the top of this
article. We'll stub out the library-specific wrappers for rendering a
template and for registering a route.</p>
<p>Fumbling around the <a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/Server.java#L35">Jooby source code</a> we see this snippet of Java:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Server</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Netty</span><span class="p">();</span><span class="w"> </span><span class="c1">// or Jetty or Utow</span>
<span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">App</span><span class="w"> </span><span class="n">app</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">App</span><span class="p">();</span>
<span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">server</span><span class="p">.</span><span class="na">start</span><span class="p">(</span><span class="n">app</span><span class="p">);</span>
<span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="o">*</span>
<span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">server</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span>
</pre></div>
<p><code>Netty</code> comes from the <code>jooby-netty</code> artifact in
the <code>io.jooby</code> group on Maven. And <code>App</code> is some
object that extends <code>io.jooby.Jooby</code>. Since we're not using
an OOP language though we're going to try avoiding classes as much as
possible. So we'll just create a new instance
of <code>io.jooby.Jooby</code> and add routes directly to it.</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">template</span><span class="w"> </span><span class="p">(</span><span class="nv">filename</span><span class="w"> </span><span class="nv">context</span><span class="p">)</span>
<span class="w"> </span><span class="s">""</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">route</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="nc">method</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">handler</span><span class="p">)</span>
<span class="w"> </span><span class="no">nil</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">register-endpoints</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">app</span><span class="w"> </span><span class="s">"GET"</span><span class="w"> </span><span class="s">"/"</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="s">"An index!"</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">app</span><span class="w"> </span><span class="s">"GET"</span><span class="w"> </span><span class="s">"/search"</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">template</span><span class="w"> </span><span class="s">"search.tmpl"</span><span class="w"> </span><span class="o">`</span><span class="p">((</span><span class="s">"version"</span><span class="w"> </span><span class="s">"1.0.0"</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="s">"results"</span><span class="w"> </span><span class="o">,</span><span class="p">(</span><span class="nv">java:jarray-from-list</span><span class="w"> </span><span class="o">'</span><span class="p">(</span><span class="s">"cat"</span><span class="w"> </span><span class="s">"dog"</span><span class="w"> </span><span class="s">"mouse"</span><span class="p">)))))))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">app</span><span class="w"> </span><span class="s">"GET"</span><span class="w"> </span><span class="s">"/hello-world"</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="s">"Hello world!"</span><span class="p">)))</span>
<span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">port</span><span class="w"> </span><span class="mi">8080</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">server</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'Netty</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'Jooby</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">register-endpoints</span><span class="w"> </span><span class="nv">app</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"setOptions"</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"setPort"</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'ServerOptions</span><span class="p">)</span><span class="w"> </span><span class="nv">port</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"start"</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="nv">app</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"join"</span><span class="w"> </span><span class="nv">server</span><span class="p">))</span>
</pre></div>
<p>Easy enough. Now we just need to implement <code>route</code>
and <code>template</code>.</p>
<h3 id="implementing-java-classes-in-abcl">Implementing Java classes in ABCL</h3><p>We are again not going the happy path with fancy Java syntax (which is
fine if you're using Java) like the Jooby documentation
suggests. Scouring the <a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/Jooby.java#L546">Jooby source code
again</a>
it looks like we can call <code>route</code> on the <code>Jooby</code>
class with a method string, a path string, and an instance of an
object implementing the <code>io.jooby.Route.Handler</code> interface.</p>
<p>Since this handler argument is an interface, we cannot cheat again by
creating an instance of it we'll have to actually create a new class
in Lisp that extends it. Thankfully there's only one method we need to
implement to satisfy this interface,
<a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/Route.java#L256">apply</a>. It
accepts a <code>io.jooby.Context</code> object and returns
a <code>java.lang.Object</code>. The framework then does introspection
to figure out what exactly the object is and if it needs to transform
it into a string to be returned as an HTTP response body.</p>
<p>To create a new class in ABCL we call <code>(java:jnew-runtime-class
"classname" :interfaces '("an interface name") :methods '(("method
name 1" "return type" ("first parameter type" ...) (lambda (this arg1
...) body))))</code>:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">route</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="nc">method</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">handler</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"route"</span>
<span class="w"> </span><span class="nv">app</span>
<span class="w"> </span><span class="nc">method</span>
<span class="w"> </span><span class="nv">path</span>
<span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jnew-runtime-class</span>
<span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\/</span><span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\-</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span>
<span class="w"> </span><span class="ss">:interfaces</span><span class="w"> </span><span class="o">'</span><span class="p">(</span><span class="s">"io.jooby.Route$Handler"</span><span class="p">)</span>
<span class="w"> </span><span class="ss">:methods</span><span class="w"> </span><span class="o">`</span><span class="p">(</span>
<span class="w"> </span><span class="p">(</span><span class="s">"apply"</span><span class="w"> </span><span class="s">"java.lang.Object"</span><span class="w"> </span><span class="p">(</span><span class="s">"io.jooby.Context"</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">this</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">funcall</span><span class="w"> </span><span class="o">,</span><span class="nv">handler</span><span class="w"> </span><span class="nv">ctx</span><span class="p">))))))))</span>
</pre></div>
<p>One thing to note is that when referring to a subclass within a file
we need to address it with the <code>io.jooby.Route$Handler</code>
syntax rather than as you might refer to it in Java
as <code>io.jooby.Route.Handler</code>. In the latter case ABCL
thinks <code>Route</code> is a package when in fact it's just a class.</p>
<p>If you run this now with <code>abcl --load main.lisp</code>. It will
work until you hit an endpoint. The problem is how Jooby tries to
figure out the real type of the returned object.</p>
<p>The app will crash somewhere around
<a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/internal/RouterImpl.java#L560">here</a>
calling <code>analyzer.returnType(route.getHandle())</code>.</p>
<p>In this case it tries to <a href="https://github.com/jooby-project/jooby/blob/f47eda4500bc4b76b23d24d4d77aa2ab3cc19e95/jooby/src/main/java/io/jooby/internal/RouteAnalyzer.java#L44">open and parse the (Java) source
code</a>
of our application to try to find the return type for
this <code>apply</code> function.</p>
<p>That's a problem since our code isn't Java. Through trial and error I
realized we can trick Jooby/Java/somebody into figuring out the
correct return type by adding another implementation
of <code>apply</code> that returns a <code>String</code> to our class.</p>
<p>The full <code>route</code> code now looks like this:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">route</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="nc">method</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">handler</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"route"</span>
<span class="w"> </span><span class="nv">app</span>
<span class="w"> </span><span class="nc">method</span>
<span class="w"> </span><span class="nv">path</span>
<span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jnew-runtime-class</span>
<span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\/</span><span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\-</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span>
<span class="w"> </span><span class="ss">:interfaces</span><span class="w"> </span><span class="o">'</span><span class="p">(</span><span class="s">"io.jooby.Route$Handler"</span><span class="p">)</span>
<span class="w"> </span><span class="ss">:methods</span><span class="w"> </span><span class="o">`</span><span class="p">(</span>
<span class="w"> </span><span class="c1">;; Need to define this one to make Jooby figure out the return type</span>
<span class="w"> </span><span class="c1">;; Otherwise it tries to read "this file" which isn't a Java file so cannot be parsed</span>
<span class="w"> </span><span class="p">(</span><span class="s">"apply"</span><span class="w"> </span><span class="s">"java.lang.String"</span><span class="w"> </span><span class="p">(</span><span class="s">"io.jooby.Context"</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">this</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="no">nil</span><span class="p">))</span>
<span class="w"> </span><span class="c1">;; This one actually gets called</span>
<span class="w"> </span><span class="p">(</span><span class="s">"apply"</span><span class="w"> </span><span class="s">"java.lang.Object"</span><span class="w"> </span><span class="p">(</span><span class="s">"io.jooby.Context"</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">this</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">funcall</span><span class="w"> </span><span class="o">,</span><span class="nv">handler</span><span class="w"> </span><span class="nv">ctx</span><span class="p">))))))))</span>
</pre></div>
<p>You may wonder, why keep the original method around? Well it's because
during reflection, ABCL says no such method that
returns <code>String</code> exists in the <code>Handler</code>
interface. That's fair I guess.</p>
<h3 id="implementing-the-template">Implementing the template</h3><p>The Java example on the <a href="https://pebbletemplates.io/">Pebble homepage</a>
is perfect:</p>
<div class="highlight"><pre><span></span><span class="n">PebbleEngine</span><span class="w"> </span><span class="n">engine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">PebbleEngine</span><span class="p">.</span><span class="na">Builder</span><span class="p">().</span><span class="na">build</span><span class="p">();</span>
<span class="n">PebbleTemplate</span><span class="w"> </span><span class="n">compiledTemplate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">engine</span><span class="p">.</span><span class="na">getTemplate</span><span class="p">(</span><span class="s">"home.html"</span><span class="p">);</span>
<span class="n">Map</span><span class="o"><</span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">context</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">HashMap</span><span class="o"><></span><span class="p">();</span>
<span class="n">context</span><span class="p">.</span><span class="na">put</span><span class="p">(</span><span class="s">"name"</span><span class="p">,</span><span class="w"> </span><span class="s">"Mitchell"</span><span class="p">);</span>
<span class="n">Writer</span><span class="w"> </span><span class="n">writer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StringWriter</span><span class="p">();</span>
<span class="n">compiledTemplate</span><span class="p">.</span><span class="na">evaluate</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">context</span><span class="p">);</span>
<span class="n">String</span><span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">writer</span><span class="p">.</span><span class="na">toString</span><span class="p">();</span>
</pre></div>
<p>We can easily translate this into Lisp:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">hashmap</span><span class="w"> </span><span class="p">(</span><span class="nv">alist</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">((</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'HashMap</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nv">el</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">alist</span>
<span class="w"> </span><span class="nb">do</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"put"</span><span class="w"> </span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="nb">car</span><span class="w"> </span><span class="nv">el</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">cadr</span><span class="w"> </span><span class="nv">el</span><span class="p">)))</span>
<span class="w"> </span><span class="nb">map</span><span class="p">))</span>
<span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">template</span><span class="w"> </span><span class="p">(</span><span class="nv">filename</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">ctx</span><span class="w"> </span><span class="p">(</span><span class="nv">hashmap</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">path</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jstatic</span><span class="w"> </span><span class="s">"of"</span><span class="w"> </span><span class="s">"java.nio.file.Path"</span><span class="w"> </span><span class="nv">filename</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">file</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"readString"</span><span class="w"> </span><span class="ss">'java.nio.file.Files</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">engine</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"build"</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'PebbleEngine$Builder</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"getTemplate"</span><span class="w"> </span><span class="nv">engine</span><span class="w"> </span><span class="nv">filename</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">writer</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'java.io.StringWriter</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"evaluate"</span><span class="w"> </span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="nv">writer</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"toString"</span><span class="w"> </span><span class="nv">writer</span><span class="p">)))</span>
</pre></div>
<p>But if you run this <code>abcl --load main.lisp</code> and hit this
<code>/search</code> endpoint, it will blow up saying "no such method"
exists at the call to <code>Path.of(filename)</code>.</p>
<p>After digging around I saw it was because
<a href="https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/nio/file/Path.html#of(java.lang.String,java.lang.String...%29">Path.of</a>
is a variadic function.</p>
<p>And while there are <a href="https://abcl.org/trac/changeset/15234">examples
of</a> using variadic functions
when the function only has a single parameter like
<code>java.util.Arrays.asList(T ...)</code>, employing that same
technique here continued to result in "no such method":</p>
<div class="highlight"><pre><span></span> (path (java:jstatic "of" "java.nio.file.Path" filename (jnew-array "java.lang.String" 0)))
</pre></div>
<p>Eventually I found an <a href="https://stackoverflow.com/questions/20440839/cant-invoke-method-with-varargs-parameters-with-reflection-nosuchmethodexcept">example of someone doing reflect/invoke on this
kind of a function
call</a>
and tried this logic on a local copy of the ABCL source code.</p>
<p>It worked. So I opened a <a href="https://github.com/armedbear/abcl/pull/379">pull request</a>.</p>
<p>So the full working code for <code>template</code> is:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">template</span><span class="w"> </span><span class="p">(</span><span class="nv">filename</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">ctx</span><span class="w"> </span><span class="p">(</span><span class="nv">hashmap</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">path</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jstatic</span><span class="w"> </span><span class="s">"of"</span><span class="w"> </span><span class="s">"java.nio.file.Path"</span><span class="w"> </span><span class="nv">filename</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jnew-array</span><span class="w"> </span><span class="s">"java.lang.String"</span><span class="w"> </span><span class="mi">0</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">file</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"readString"</span><span class="w"> </span><span class="ss">'java.nio.file.Files</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">engine</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"build"</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'PebbleEngine$Builder</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#"getTemplate"</span><span class="w"> </span><span class="nv">engine</span><span class="w"> </span><span class="nv">filename</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">writer</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">'java.io.StringWriter</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"evaluate"</span><span class="w"> </span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="nv">writer</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="l l-Other">#"toString"</span><span class="w"> </span><span class="nv">writer</span><span class="p">)))</span>
</pre></div>
<p>And to get this diff running locally:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>~/vendor
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>~/vendor
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/eatonphil/abcl
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>abcl
$<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>pe/more-variadic
$<span class="w"> </span>sudo<span class="w"> </span><span class="o">{</span>dnf/brew/apt<span class="o">}</span><span class="w"> </span>install<span class="w"> </span>ant<span class="w"> </span>maven
$<span class="w"> </span>ant<span class="w"> </span>-f<span class="w"> </span>build.xml
</pre></div>
<p>And to run <code>main.lisp</code> using this diff:</p>
<div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="o">~/</span><span class="n">vendor</span><span class="o">/</span><span class="n">abcl</span><span class="o">/</span><span class="n">abcl</span><span class="w"> </span><span class="o">--</span><span class="nb">load</span><span class="w"> </span><span class="n">main</span><span class="o">.</span><span class="n">lisp</span>
</pre></div>
<p>And to hit the API:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:8080/search
<html>
<title>Version<span class="w"> </span><span class="m">1</span>.0.0</title>
<span class="w"> </span><h2>cat</h2>
<span class="w"> </span><h2>dog</h2>
<span class="w"> </span><h2>mouse</h2>
</html>
$<span class="w"> </span>curl<span class="w"> </span>localhost:8080/hello-world
Hello<span class="w"> </span>world!%
</pre></div>
<p>Phew! Easy peasy.</p>
<h3 id="next-up">Next up</h3><p>I'm porting this example to Kawa to see how it fares. Blog post to come.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">In a ridiculous attempt to prove an internet wrong about the practicality of Lisp (Common Lisp specifically), I tried to get a simple (but realistic) web app running. After four days and a patch to ABCL I got something working.<a href="https://t.co/5UUWNR8Wnn">https://t.co/5UUWNR8Wnn</a> <a href="https://t.co/cZsx32IlKD">pic.twitter.com/cZsx32IlKD</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1423345414279942150?ref_src=twsrc%5Etfw">August 5, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/practical-common-lisp-on-the-jvm.htmlThu, 05 Aug 2021 00:00:00 +0000
- Writing an efficient object previewer for JavaScripthttp://notes.eatonphil.com/writing-an-efficient-javascript-object-previewer.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-07-15-writing-an-efficient-javascript-object-previewer.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-07-15-writing-an-efficient-javascript-object-previewer.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/writing-an-efficient-javascript-object-previewer.htmlThu, 15 Jul 2021 00:00:00 +0000
- React without webpack: fast path to a working app from scratchhttp://notes.eatonphil.com/react-without-webpack.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-07-08-react-without-webpack.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-07-08-react-without-webpack.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/react-without-webpack.htmlThu, 08 Jul 2021 00:00:00 +0000
- Controlled HTML select element in React has weird default UXhttp://notes.eatonphil.com/controlled-select-element-in-react-has-weird-ux.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-06-25-select-in-react-broken-by-default.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-06-25-select-in-react-broken-by-default.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/controlled-select-element-in-react-has-weird-ux.htmlFri, 25 Jun 2021 00:00:00 +0000
- Leaders, you need to share organization success stories more frequentlyhttp://notes.eatonphil.com/leaders-share-company-success-stories.html<p>This post goes out to anyone who leads a team: managers, directors,
VPs, executives. You need to share organization success stories with
your organization on a regular and frequent basis. Talk about sales
wins, talk about new services released, talk about the positive impact
of a recent organizational change. Just get in front of your entire
organization and tell them how the organization is making a positive
difference.</p>
<p>Do this at least every other week.</p>
<p>And in case it's not clear, by "success stories" I don't mean nonsense,
or opinions. I mean concrete, measurable things that moved the
organization forward.</p>
<p>Everyone in your organization is contributing to these stories and
it's your job to feed the stories back.</p>
<p>Leaders have a tendency to hear about successes but don't always
remember to propagate the stories down. I've been guilty of this
myself. This post is your (and my own) friendly reminder.</p>
<p>If you don't keep reminding your folks their organization is making a
positive impact, they're going to forget it. You'll miss out on the
freely available chance to give reassurance to your best people.</p>
<p>Talented folks want to be invested in an organization that is
succeeding.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post for all the people managers, directors, VPs out there: you need to regularly share success stories with your whole organization. Everyone wants to be part of an organization that is doing good work.<a href="https://t.co/XgaY5Ri1tA">https://t.co/XgaY5Ri1tA</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1407451413156929537?ref_src=twsrc%5Etfw">June 22, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/leaders-share-company-success-stories.htmlTue, 22 Jun 2021 00:00:00 +0000
- Languages you can run in the browser, part 1: Python, JavaScript, SQLitehttp://notes.eatonphil.com/languages-you-can-run-in-the-browser.html<head>
<meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-06-16-languages-you-can-run-in-the-browser.html'" />
</head><p>This is an external post of mine. Click
<a href="https://datastation.multiprocess.io/blog/2021-06-16-languages-you-can-run-in-the-browser.html">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/languages-you-can-run-in-the-browser.htmlThu, 17 Jun 2021 00:00:00 +0000
- Coolest hard-tech companies in NYC 2021http://notes.eatonphil.com/coolest-tech-companies-in-nyc-2021.html<p>For years I've kept a private list of really cool tech companies in
NYC. Now that I'm funemployed it's the perfect time to publish. This
list is influenced by 1) my perception of the difficulty of the
engineering behind the product and 2) the company's educational and
OSS presence.</p>
<p>With no further ado and in no particular order, here's my list!</p>
<h3 id="backtrace">Backtrace</h3><p>This company builds a product for debugging
mobile crashes. Your app produces a crash dump and their debugger will
help you figure out what went wrong. That's freaking awesome.</p>
<p><a href="https://backtrace.io">https://backtrace.io</a></p>
<h3 id="equinix-metal-(previously-packet)">Equinix Metal (previously Packet)</h3><p>This company provides an API around scheduling hardware servers in
their datacenters, not virtual machines. That's nuts.</p>
<p><a href="https://packet.com">https://packet.com</a></p>
<h3 id="digital-ocean">Digital Ocean</h3><p>Ok I used to work for Linode and am still a massive fan but I love all
the clouds and this post is about NYC not Philly. If you want to learn
how Linux works you have to work here.</p>
<p><a href="https://www.digitalocean.com/">https://www.digitalocean.com/</a></p>
<h3 id="ns1">NS1</h3><p>This company does DNS. Seeing as <a href="https://www.cyberciti.biz/humour/a-haiku-about-dns/">it was
DNS</a>, if you want
to understand how the internet works go work for this group.</p>
<p><a href="https://ns1.com/">https://ns1.com/</a></p>
<h3 id="squarespace">SquareSpace</h3><p>The first program I made in 7th grade was a Java program that
generated HTML from terminal prompts in my first attempt at a
CMS. Stuff that builds stuff is amazing and SquareSpace is kinda OG.</p>
<p>They also just IPO-ed so the comp won't be imaginary!</p>
<p>Disclosure: my wife works here, but they've been on my list longer
than that.</p>
<p><a href="https://www.squarespace.com/">https://www.squarespace.com/</a></p>
<h3 id="grafana">Grafana</h3><p>Amazing platform. Everyone who can't afford Splunk or doesn't want to
buy competitor's products uses ElasticSearch and Grafana. I didn't
realize until double-checking my research that Grafana is even based
in NYC. Let's hope they're hiring developers here.</p>
<p><a href="https://grafana.com/">https://grafana.com/</a></p>
<h3 id="frame.io">Frame.io</h3><p>It's like Figma for video. Clearly the future.</p>
<p><a href="https://www.frame.io/">https://www.frame.io/</a></p>
<h3 id="datadog">DataDog</h3><p>DataDog feels like the only real competitor in the hosted server analytics.</p>
<p>Their stock has been doing surprisingly well, or maybe I'm just tired
from WeWork, Uber, et al.</p>
<p><a href="https://www.datadoghq.com/">https://www.datadoghq.com/</a></p>
<h3 id="chronosphere">Chronosphere</h3><p>I'm a sucker for startups doing hosted data and search because that's
really hard. Chronosphere does Uber-scale log storage/analysis.</p>
<p><a href="https://chronosphere.io/">https://chronosphere.io/</a></p>
<h3 id="cockroach-labs">Cockroach Labs</h3><p>Worst company name but maybe one of the single coolest products in
NYC. They built a PostgreSQL compatible scalable platform in
Go. Everything about that is amazing.</p>
<p>They've also turned down my application like 5 times now though so
maybe they're very picky. :)</p>
<p><a href="https://www.cockroachlabs.com/">https://www.cockroachlabs.com/</a></p>
<h3 id="mongodb">MongoDB</h3><p>It's cloud scale! Need more be said.</p>
<p><a href="https://www.mongodb.com/">https://www.mongodb.com/</a></p>
<h3 id="trail-of-bits">Trail of Bits</h3><p>I don't actually understand what they do or if they have a product but
their <a href="https://github.com/trailofbits">Github presence</a> is amazing and
they're dedicated to educating the community which is one of the most
important things I think a company can do.</p>
<p><a href="https://www.trailofbits.com/">https://www.trailofbits.com/</a></p>
<h3 id="capsule8">Capsule8</h3><p>I moved to NYC for this company because the founders and product are
insane. If you want to learn how compilers and Linux don't work,
you've got to come here.</p>
<p>Disclosure: I own stock.</p>
<p><a href="https://capsule8.com/">https://capsule8.com/</a></p>
<h3 id="two-sigma">Two Sigma</h3><p>Algorithmic trading? Maybe the smartest guys in NYC? They don't accept
candidates without bachelor's degrees or they just don't like me. ;)
They also host the only good tech meetups in NYC: Linux User Group and
Papers We Love.</p>
<p><a href="https://www.twosigma.com/">https://www.twosigma.com/</a></p>
<h3 id="jane-street">Jane Street</h3><p>Another algorithmic trading company but this time with OCaml. They're
so crazy <a href="https://blog.janestreet.com/what-the-interns-have-wrought-2018/">you should see what the intern
built</a>.</p>
<p><a href="https://www.janestreet.com/">https://www.janestreet.com/</a></p>
<h3 id="vimeo">Vimeo</h3><p>Everybody loves an underdog story. And the <a href="https://www.linkedin.com/pulse/now-shes-ceo-vimeo-after-rejected-dozens-companies-mamta-shah-/">CEO seems really
cool</a>.</p>
<p><a href="https://vimeo.com/">https://vimeo.com/</a></p>
<h3 id="etsy">Etsy</h3><p>Their blog posts and engineering organization philosophy are widely
regarded. And they've got a sweet headquarters in Brooklyn.</p>
<p><a href="https://www.etsy.com/">https://www.etsy.com/</a></p>
<h3 id="sisense">Sisense</h3><p>If you're not using ElasticSearch and you're not using Splunk, you
might be using Sisense. Again, I'm a big sucker for data and analytics
platforms.</p>
<p><a href="https://www.sisense.com/">https://www.sisense.com/</a></p>
<h3 id="codeacademy">CodeAcademy</h3><p>I am 100% on board with giving people opportunities in tech.</p>
<p><a href="https://www.codecademy.com/">https://www.codecademy.com/</a></p>
<h3 id="stack-overflow">Stack Overflow</h3><p>They were just bought! But they still exist I suppose. If you love
.NET you've got to work here.</p>
<p><a href="https://stackoverflow.com/">https://stackoverflow.com/</a></p>
<h3 id="that's-it!">That's it!</h3><p>Tell me what you think and if I'm missing any hard-tech companies in
NYC. I'm sure I am.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here's my light-hearted take on some of the coolest tech companies in NYC in 2021<a href="https://twitter.com/Frame_io?ref_src=twsrc%5Etfw">@Frame_io</a> <a href="https://twitter.com/equinixmetal?ref_src=twsrc%5Etfw">@equinixmetal</a> <a href="https://twitter.com/digitalocean?ref_src=twsrc%5Etfw">@digitalocean</a> <a href="https://twitter.com/capsule8?ref_src=twsrc%5Etfw">@capsule8</a> <a href="https://twitter.com/NS1?ref_src=twsrc%5Etfw">@NS1</a> <a href="https://twitter.com/grafana?ref_src=twsrc%5Etfw">@grafana</a> <a href="https://twitter.com/CockroachDB?ref_src=twsrc%5Etfw">@CockroachDB</a> <a href="https://twitter.com/squarespace?ref_src=twsrc%5Etfw">@squarespace</a> <a href="https://twitter.com/chronosphereio?ref_src=twsrc%5Etfw">@chronosphereio</a> <a href="https://twitter.com/datadoghq?ref_src=twsrc%5Etfw">@datadoghq</a> <a href="https://twitter.com/MongoDB?ref_src=twsrc%5Etfw">@MongoDB</a> <a href="https://twitter.com/trailofbits?ref_src=twsrc%5Etfw">@trailofbits</a> <a href="https://twitter.com/twosigma?ref_src=twsrc%5Etfw">@twosigma</a> <a href="https://twitter.com/Vimeo?ref_src=twsrc%5Etfw">@Vimeo</a> <a href="https://twitter.com/Etsy?ref_src=twsrc%5Etfw">@Etsy</a> and more<a href="https://t.co/ZAcvptvLbZ">https://t.co/ZAcvptvLbZ</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1400815765117353989?ref_src=twsrc%5Etfw">June 4, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/coolest-tech-companies-in-nyc-2021.htmlFri, 04 Jun 2021 00:00:00 +0000
- Writing a Jinja-inspired template library in Pythonhttp://notes.eatonphil.com/writing-a-template-library-in-python.html<p>In this post we'll build a minimal text templating library in Python
inspired by Jinja. It will be able to display variables and iterate
over arrays.</p>
<p>By the end of this article, with around 300 lines of code, we'll be
able to create this program:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pytemplate</span> <span class="kn">import</span> <span class="n">eval_template</span>
<span class="n">template</span> <span class="o">=</span> <span class="s1">'''</span>
<span class="s1"><html></span>
<span class="s1"> <body></span>
<span class="s1"> {</span><span class="si">% f</span><span class="s1">or-in(post, posts) %}</span>
<span class="s1"> <article></span>
<span class="s1"> <h1>{{ get(post, 'title') }}</h1></span>
<span class="s1"> <p></span>
<span class="s1"> {{ get(post, 'body') }}</span>
<span class="s1"> </p></span>
<span class="s1"> </article></span>
<span class="s1"> {</span><span class="si">% e</span><span class="s1">ndfor-in %}</span>
<span class="s1"> </body></span>
<span class="s1"></html></span>
<span class="s1">'''</span>
<span class="n">env</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'posts'</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'Hello world!'</span><span class="p">,</span>
<span class="s1">'body'</span><span class="p">:</span> <span class="s1">'This is my first post!'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s1">'title'</span><span class="p">:</span> <span class="s1">'Take two'</span><span class="p">,</span>
<span class="s1">'body'</span><span class="p">:</span> <span class="s1">'This is a second post.'</span><span class="p">,</span>
<span class="p">},</span>
<span class="p">],</span>
<span class="p">}</span>
<span class="nb">print</span><span class="p">(</span><span class="n">eval_template</span><span class="p">(</span><span class="n">template</span><span class="p">,</span> <span class="n">env</span><span class="p">))</span>
</pre></div>
<p>That runs and produces what we expect:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python3<span class="w"> </span>test.py
<html>
<span class="w"> </span><body>
<span class="w"> </span><article>
<span class="w"> </span><h1>Hello<span class="w"> </span>world!</h1>
<span class="w"> </span><p>
<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>my<span class="w"> </span>first<span class="w"> </span>post!
<span class="w"> </span></p>
<span class="w"> </span></article>
<span class="w"> </span><article>
<span class="w"> </span><h1>Take<span class="w"> </span>two</h1>
<span class="w"> </span><p>
<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>second<span class="w"> </span>post.
<span class="w"> </span></p>
<span class="w"> </span></article>
<span class="w"> </span></body>
</html>
</pre></div>
<p>All code is available on
<a href="https://github.com/eatonphil/pytemplate">Github</a>. Let's dig in.</p>
<h3 id="specification">Specification</h3><p>In this templating language, pytemplate, <code>{% $function () %}
... {% end$function %}</code> blocks are specially evaluated depending
on the particular function being called. For example, the <code>for-in
($iter_name, $array)</code> function will duplicate its children for
every element in <code>$array</code>. Within the body of the loop, the
variable <code>$iter_name</code> will exist and be set to the current
element in the array.</p>
<p>While we won't implement it here, you can imagine what the <code>if
($test)</code> block function might do.</p>
<h3 id="arguments,-expressions,-function-calls:-nodes">Arguments, expressions, function calls: nodes</h3><p>Function arguments are expressions (or <code>nodes</code> as we'll
call them). They can be strings (surrounded by single quotes),
identifiers found in a provided dictionary (or
<code>environment</code> as we'll call it), or nested function calls
(also called nodes).</p>
<h3 id="non-blocks:-tags">Non-blocks: tags</h3><p>The non-block syntax <code>{{ ... }}</code> are just called tags. The
inside of a tag is a node and is evaluated the same way a function
argument is.</p>
<h3 id="architecture">Architecture</h3><p>We'll break up the library into a few main parts:</p>
<ul>
<li>Lexer for the node language</li>
<li>Parser for the node language</li>
<li>Lexer for blocks, tags, and text</li>
<li>Parser for blocks, tags, and text</li>
<li>Interpreter that takes an AST and an environment dictionary and produces text</li>
<li>An entrypoint to tie all the above together</li>
</ul>
<p>We'll tackle these aspects in roughly reverse order.</p>
<h3 id="entrypoint">Entrypoint</h3><p>When we call the library we want to be able to just accept a template
string and an environment dictionary. The result of the entrypoint
will be the evaluated template.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">io</span>
<span class="k">def</span> <span class="nf">eval_template</span><span class="p">(</span><span class="n">template</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">env</span><span class="p">:</span> <span class="nb">dict</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">lex</span><span class="p">(</span><span class="n">template</span><span class="p">)</span>
<span class="n">ast</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="k">with</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span> <span class="k">as</span> <span class="n">memfd</span><span class="p">:</span>
<span class="n">interpret</span><span class="p">(</span><span class="n">memfd</span><span class="p">,</span> <span class="n">ast</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span>
<span class="k">return</span> <span class="n">memfd</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span>
</pre></div>
<p>Where lex, parse, and interpret have to do with the block- and
tag-level language.</p>
<h3 id="block,-tag-and-text-lexing">Block, tag and text lexing</h3><p>This process is responsible for turning the template string into an
array of tokens. To make the code simpler, lexing for the function
call and expression language is done separately. At this stage all
we'll look for is tokens consisting of block and tag end and beginning
markers. So
just <code>{%</code>, <code>%}</code>, <code>{{</code>, <code>}}</code>. If
a token is not one of these, it is regular text.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="n">BLOCK_OPEN</span> <span class="o">=</span> <span class="s1">'{%'</span>
<span class="n">BLOCK_CLOSE</span> <span class="o">=</span> <span class="s1">'%}'</span>
<span class="n">TAG_OPEN</span> <span class="o">=</span> <span class="s1">'{{'</span>
<span class="n">TAG_CLOSE</span> <span class="o">=</span> <span class="s1">'}}'</span>
<span class="k">def</span> <span class="nf">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">):</span>
<span class="k">if</span> <span class="n">cursor</span> <span class="o"><</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">if</span> <span class="n">cursor</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">source</span><span class="p">):</span>
<span class="k">return</span> <span class="n">source</span><span class="p">[</span><span class="n">cursor</span><span class="p">]</span>
<span class="k">return</span> <span class="kc">None</span>
<span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">source</span><span class="p">):</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">current</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">cursor</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">source</span><span class="p">):</span>
<span class="n">char</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">char</span> <span class="o">==</span> <span class="s1">'{'</span><span class="p">:</span>
<span class="c1"># Handle escaping {</span>
<span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'{'</span><span class="p">:</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">continue</span>
<span class="n">next_char</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">next_char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'%'</span><span class="p">,</span> <span class="s1">'{'</span><span class="p">]:</span>
<span class="k">if</span> <span class="n">current</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span>
<span class="s1">'cursor'</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">current</span><span class="p">),</span>
<span class="p">})</span>
<span class="n">current</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">BLOCK_OPEN</span> <span class="k">if</span> <span class="n">next_char</span> <span class="o">==</span> <span class="s1">'%'</span> <span class="k">else</span> <span class="n">TAG_OPEN</span><span class="p">,</span>
<span class="s1">'cursor'</span><span class="p">:</span> <span class="n">cursor</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">2</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'%'</span><span class="p">,</span> <span class="s1">'}'</span><span class="p">]:</span>
<span class="c1"># Handle escaping % and }</span>
<span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="n">char</span><span class="p">:</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="s1">'}'</span><span class="p">:</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">current</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span>
<span class="s1">'cursor'</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">current</span><span class="p">),</span>
<span class="p">})</span>
<span class="n">current</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">BLOCK_CLOSE</span> <span class="k">if</span> <span class="n">char</span> <span class="o">==</span> <span class="s1">'%'</span> <span class="k">else</span> <span class="n">TAG_CLOSE</span><span class="p">,</span>
<span class="s1">'cursor'</span><span class="p">:</span> <span class="n">cursor</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">2</span>
<span class="k">continue</span>
<span class="n">current</span> <span class="o">+=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">current</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span>
<span class="s1">'cursor'</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">current</span><span class="p">),</span>
<span class="p">})</span>
<span class="k">return</span> <span class="n">tokens</span>
</pre></div>
<p>That's it for lexing!</p>
<h3 id="block,-tag-and-text-parsing">Block, tag and text parsing</h3><p>Next up is a matter of finding the ending/closing patterns in the array of tokens. There are a few main rules we'll look for:</p>
<ul>
<li>Every open tag symbol <code>{{</code> must be followed by a text token then a closing tag symbol <code>}}</code><ul>
<li>The text within the open and close tag must parse into a valid expression (we'll define this logic later)</li>
</ul>
</li>
<li>Every block symbol <code>{%</code> must be followed by a text token then an end of block symbol <code>%}</code><ul>
<li>The text token within the open and close block must parse into a valid function call (we'll define this logic later)</li>
</ul>
</li>
<li>Every block must have a matching end block where the text in the end block is <code>end</code> concatenated to the beginning of the function being called in the start block<ul>
<li>The text between two blocks can contain nested blocks or tags</li>
</ul>
</li>
</ul>
<p>Let's codify that:</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">end_of_block_marker</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">cursor</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">TAG_OPEN</span><span class="p">:</span>
<span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">2</span><span class="p">)[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">!=</span> <span class="n">TAG_CLOSE</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected closing tag'</span><span class="p">)</span>
<span class="n">node_tokens</span> <span class="o">=</span> <span class="n">lex_node</span><span class="p">(</span><span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="n">node_ast</span> <span class="o">=</span> <span class="n">parse_node</span><span class="p">(</span><span class="n">node_tokens</span><span class="p">)</span>
<span class="n">ast</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'tag'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">node_ast</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">3</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">TAG_CLOSE</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected opening tag'</span><span class="p">)</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">BLOCK_OPEN</span><span class="p">:</span>
<span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">2</span><span class="p">)[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">!=</span> <span class="n">BLOCK_CLOSE</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected end of block open'</span><span class="p">)</span>
<span class="n">block</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="n">node_tokens</span> <span class="o">=</span> <span class="n">lex_node</span><span class="p">(</span><span class="n">block</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="n">node_ast</span> <span class="o">=</span> <span class="n">parse_node</span><span class="p">(</span><span class="n">node_tokens</span><span class="p">)</span>
<span class="k">if</span> <span class="n">end_of_block_marker</span> <span class="ow">and</span> <span class="s1">'end'</span><span class="o">+</span><span class="n">end_of_block_marker</span> <span class="o">==</span> <span class="n">node_ast</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]:</span>
<span class="k">return</span> <span class="n">ast</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">3</span>
<span class="n">child</span><span class="p">,</span> <span class="n">cursor_offset</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">cursor</span><span class="o">+</span><span class="mi">3</span><span class="p">:],</span> <span class="n">node_ast</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="k">if</span> <span class="n">cursor_offset</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Failed to find end of block'</span><span class="p">)</span>
<span class="n">ast</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'block'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">node_ast</span><span class="p">,</span>
<span class="s1">'child'</span><span class="p">:</span> <span class="n">child</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="n">cursor_offset</span> <span class="o">+</span> <span class="mi">3</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">BLOCK_CLOSE</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected start of block open'</span><span class="p">)</span>
<span class="n">ast</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'text'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">t</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">ast</span><span class="p">,</span> <span class="n">cursor</span>
</pre></div>
<p>And that's it for parsing blocks and tags. Now we have to get into the
node language.</p>
<h3 id="node-lexing">Node lexing</h3><p>In the node language, everything is either a literal or a function
call. Whitespace is ignored. The only special symbols in the node
language are commas and parentheses.</p>
<p>So to break the text into tokens we just iterate over all characters
until we find whitespace or a symbol. Accumulate the characters that
are not either. Add everything but whitespace to the list of tokens.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_node</span><span class="p">(</span><span class="n">source</span><span class="p">):</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">current</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">while</span> <span class="n">cursor</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">source</span><span class="p">):</span>
<span class="n">char</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'</span><span class="se">\r</span><span class="s1">'</span><span class="p">,</span> <span class="s1">'</span><span class="se">\t</span><span class="s1">'</span><span class="p">,</span> <span class="s1">'</span><span class="se">\n</span><span class="s1">'</span><span class="p">,</span> <span class="s1">' '</span><span class="p">]:</span>
<span class="k">if</span> <span class="n">current</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">current</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">'('</span><span class="p">,</span> <span class="s1">')'</span><span class="p">,</span> <span class="s1">','</span><span class="p">]:</span>
<span class="k">if</span> <span class="n">current</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'literal'</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">current</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">char</span><span class="p">,</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'syntax'</span><span class="p">,</span>
<span class="p">})</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">continue</span>
<span class="n">current</span> <span class="o">+=</span> <span class="n">char</span>
<span class="n">cursor</span> <span class="o">+=</span><span class="mi">1</span>
<span class="k">return</span> <span class="n">tokens</span>
</pre></div>
<p>And that's it for node lexing.</p>
<h3 id="node-parsing">Node parsing</h3><p>We'll break this up into two functions. The first is just for parsing
literals and function calls.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_node</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">ast</span> <span class="o">=</span> <span class="kc">None</span>
<span class="k">while</span> <span class="n">cursor</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">t</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'literal'</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected literal'</span><span class="p">)</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">next_t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">next_t</span><span class="p">:</span>
<span class="n">ast</span> <span class="o">=</span> <span class="n">t</span>
<span class="k">break</span>
<span class="k">if</span> <span class="n">next_t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">'('</span><span class="p">:</span>
<span class="n">ast</span> <span class="o">=</span> <span class="n">t</span>
<span class="k">break</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">next_t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'('</span><span class="p">:</span>
<span class="n">args</span><span class="p">,</span> <span class="n">cursor</span> <span class="o">=</span> <span class="n">parse_node_args</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">cursor</span><span class="p">:])</span>
<span class="n">ast</span> <span class="o">=</span> <span class="p">{</span>
<span class="s1">'type'</span><span class="p">:</span> <span class="s1">'function'</span><span class="p">,</span>
<span class="s1">'value'</span><span class="p">:</span> <span class="n">t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">(),</span>
<span class="s1">'args'</span><span class="p">:</span> <span class="n">args</span><span class="p">,</span>
<span class="p">}</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">2</span>
<span class="k">break</span>
<span class="k">if</span> <span class="n">cursor</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Failed to parse node: '</span> <span class="o">+</span> <span class="n">tokens</span><span class="p">[</span><span class="n">cursor</span><span class="p">][</span><span class="s1">'value'</span><span class="p">])</span>
<span class="k">return</span> <span class="n">ast</span>
</pre></div>
<p>The second is for parsing function call arguments.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_node_args</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">args</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">cursor</span> <span class="o"><</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span>
<span class="k">if</span> <span class="n">t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">')'</span><span class="p">:</span>
<span class="k">return</span> <span class="n">args</span><span class="p">,</span> <span class="n">cursor</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="ow">and</span> <span class="n">t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">','</span><span class="p">:</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="ow">and</span> <span class="n">t</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">','</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected comma to separate args'</span><span class="p">)</span>
<span class="n">args</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">))</span>
<span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">args</span><span class="p">,</span> <span class="n">cursor</span>
</pre></div>
<p>And that's it for parsing and lexing the entire whole template and
node language!</p>
<h3 id="interpreting">Interpreting</h3><p>Interpreting is a matter of iterating over the AST recursively,
writing out literal text, evaluating the contents of tags, and doing
special processing for blocks.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">interpret</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">ast</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">ast</span><span class="p">:</span>
<span class="n">item_type</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">item_type</span> <span class="o">==</span> <span class="s1">'text'</span><span class="p">:</span>
<span class="n">outfd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">])</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">item_type</span> <span class="o">==</span> <span class="s1">'tag'</span><span class="p">:</span>
<span class="n">tag_value</span> <span class="o">=</span> <span class="n">interpret_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span>
<span class="n">outfd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">tag_value</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">item_type</span> <span class="o">==</span> <span class="s1">'block'</span><span class="p">:</span>
<span class="n">interpret_block</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="s1">'child'</span><span class="p">],</span> <span class="n">env</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Unknown type: '</span> <span class="o">+</span> <span class="n">item_type</span><span class="p">)</span>
</pre></div>
<h4 id="intepreting-nodes">Intepreting nodes</h4><p>A node is one of two things:</p>
<ul>
<li>A literal which is either a<ul>
<li>String if surrounded by single quotes</li>
<li>Otherwise an identifier to be looked up in the environment dictionary</li>
</ul>
</li>
<li>Or a function call</li>
</ul>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">interpret_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'type'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'literal'</span><span class="p">:</span>
<span class="c1"># Is a string</span>
<span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"'"</span> <span class="ow">and</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="s2">"'"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">][</span><span class="mi">1</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="c1"># Default to an env lookup</span>
<span class="k">return</span> <span class="n">env</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]]</span>
<span class="n">function</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'args'</span><span class="p">]</span>
</pre></div>
<p>Let's define <code>==</code> which checks if all args are equal. First
we have to interpret all args and then we return True if they are all
equal.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">'=='</span><span class="p">:</span>
<span class="n">arg_vals</span> <span class="o">=</span> <span class="p">[</span><span class="n">interpret_node</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span>
<span class="k">if</span> <span class="n">arg_vals</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="n">arg_vals</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">arg_vals</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">return</span> <span class="kc">False</span>
</pre></div>
<p>Now let's define a helper for retrieving an entry from a dictionary,
called <code>get</code>. This will evaluate its first arg and assume
it is a dictionary. Then it will evaluate its second arg and assume it
is a key in the dictionary. Then it will return the result of looking
up the key in the dictionary.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">'get'</span><span class="p">:</span>
<span class="n">arg_vals</span> <span class="o">=</span> <span class="p">[</span><span class="n">interpret_node</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span>
<span class="k">return</span> <span class="n">arg_vals</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">arg_vals</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span>
</pre></div>
<p>And if its neither of these supported functions, just raise an error.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Unknown function: '</span> <span class="o">+</span> <span class="n">function</span><span class="p">)</span>
</pre></div>
<h4 id="interpreting-blocks">Interpreting blocks</h4><p>Blocks are just a little different than a generic node. In addition to
being evaluated they act on a child AST within the start and end of
the block.</p>
<p>For example, in an <code>if</code> block we will evaluate its argument
and recursively call <code>interpret</code> on the child AST if the argument is
truthy.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">interpret_block</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">child</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span>
<span class="n">function</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'value'</span><span class="p">]</span>
<span class="n">args</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">'args'</span><span class="p">]</span>
<span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">'if'</span> <span class="ow">and</span> <span class="n">interpret_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span>
<span class="n">interpret</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">child</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span>
<span class="k">return</span>
</pre></div>
<p>And for <code>for-in</code> we will use the first argument as the name
of an identifier to be copied into a child environment
dictionary. We'll interpret the second argument and then iterate over
it, calling <code>interpret</code> recursively for each item in the
array and passing the child environment dictionary so it has access to
the current element.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">'for-in'</span><span class="p">:</span>
<span class="n">loop_variable</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">loop_iter_variable</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">'value'</span><span class="p">]</span>
<span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">interpret_node</span><span class="p">(</span><span class="n">loop_variable</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span>
<span class="n">child_env</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">child_env</span><span class="p">[</span><span class="n">loop_iter_variable</span><span class="p">]</span> <span class="o">=</span> <span class="n">elem</span>
<span class="n">interpret</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">child</span><span class="p">,</span> <span class="n">child_env</span><span class="p">)</span>
<span class="k">return</span>
</pre></div>
<p>Just like before, if we see a block we don't support yet, throw an
error.</p>
<p><span class="code-caption">pytemplate.py</span></p>
<div class="highlight"><pre><span></span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Unsupported block node function: '</span> <span class="o">+</span> <span class="n">function</span><span class="p">)</span>
</pre></div>
<p>And that's that. :)</p>
<h3 id="run-it">Run it</h3><p>Now we can give the example from the beginning a shot.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python3<span class="w"> </span>test.py
<html>
<span class="w"> </span><body>
<span class="w"> </span><article>
<span class="w"> </span><h1>Hello<span class="w"> </span>world!</h1>
<span class="w"> </span><p>
<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>my<span class="w"> </span>first<span class="w"> </span>post!
<span class="w"> </span></p>
<span class="w"> </span></article>
<span class="w"> </span><article>
<span class="w"> </span><h1>Take<span class="w"> </span>two</h1>
<span class="w"> </span><p>
<span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>second<span class="w"> </span>post.
<span class="w"> </span></p>
<span class="w"> </span></article>
<span class="w"> </span></body>
</html>
</pre></div>
<p>Pretty sweet for only 300 lines of Python!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Been wanting to write a Python template library ever since I failed trying to do so years ago in Standard ML. Here's my take on a Jinja-like library!<a href="https://t.co/P1nAV6fSxk">https://t.co/P1nAV6fSxk</a> <a href="https://t.co/DbXQt1JYx8">pic.twitter.com/DbXQt1JYx8</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1396535283190046722?ref_src=twsrc%5Etfw">May 23, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/writing-a-template-library-in-python.htmlSun, 23 May 2021 00:00:00 +0000
- Learning a new codebase: hacking on nginxhttp://notes.eatonphil.com/learning-a-new-codebase-hacking-nginx.html<p>I have never contributed to nginx. My C skills are 1/10. But
downloading the source, hacking it up, compiling it, and running it
doesn't scare me. This post is to help you overcome your own fears
about doing so. Not necessarily because you should be running
out-of-tree diffs in production but because I see a lot of developers
never even consider looking at the source of a big tool or dependency
they use.</p>
<p>Most of all, studying mature software projects is one of the best ways
to grow as a programmer.</p>
<h3 id="source-and-build">Source and build</h3><p>At a high-level, the steps for hacking on software projects are always
the same:</p>
<ol>
<li>Find/download the source code</li>
<li>Install necessary dependency libraries/compilers</li>
<li>Start grepping around based on something you see in the output or capabilities you know exist</li>
<li>Make a change</li>
<li>Run some variation of <code>./configure && make</code> to build</li>
<li>Run the program</li>
<li>Go back to step 4 until you're happy</li>
</ol>
<h3 id="nginx">nginx</h3><p>Let's follow these steps for nginx. We google <code>nginx github</code>
to learn that there's a read-only copy of the source on
<a href="https://github.com/nginx/nginx">Github</a>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>~/vendor
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>~/vendor
$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/nginx/nginx
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>nginx
</pre></div>
<p>There's no readme, bummer. We google <code>nginx build from
source</code> and find
<a href="http://nginx.org/en/docs/configure.html">this</a>. We see it's a typical
C project that builds exactly as guessed: <code>./configure &&
make</code>. And it doesn't look like it has any third-party
dependencies besides my C compiler.</p>
<p>Install autoconf, gmake, and a C compiler. There's no <code>./configure</code>
file in this directory but notice there is a <code>configure</code> file in
<code>auto</code>. Trying <code>cd auto && ./configure</code> crashes so let's try
<code>./auto/configure</code>. That seems to do it except for the warning:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./auto/configure
...
./auto/configure:<span class="w"> </span>error:<span class="w"> </span>the<span class="w"> </span>HTTP<span class="w"> </span>rewrite<span class="w"> </span>module<span class="w"> </span>requires<span class="w"> </span>the<span class="w"> </span>PCRE<span class="w"> </span>library.
You<span class="w"> </span>can<span class="w"> </span>either<span class="w"> </span>disable<span class="w"> </span>the<span class="w"> </span>module<span class="w"> </span>by<span class="w"> </span>using<span class="w"> </span>--without-http_rewrite_module
option,<span class="w"> </span>or<span class="w"> </span>install<span class="w"> </span>the<span class="w"> </span>PCRE<span class="w"> </span>library<span class="w"> </span>into<span class="w"> </span>the<span class="w"> </span>system,<span class="w"> </span>or<span class="w"> </span>build<span class="w"> </span>the<span class="w"> </span>PCRE<span class="w"> </span>library
statically<span class="w"> </span>from<span class="w"> </span>the<span class="w"> </span><span class="nb">source</span><span class="w"> </span>with<span class="w"> </span>nginx<span class="w"> </span>by<span class="w"> </span>using<span class="w"> </span>--with-pcre<span class="o">=</span><path><span class="w"> </span>option.
</pre></div>
<p>Run <code>./auto/configure --without-http_rewrite_module</code>. And then again
when that fails but also omitting <code>http_gzip_module</code>.</p>
<p>Ok autoconfigure is done. Now we've got a Makefile. Run <code>make -j</code> to
compile using all cores.</p>
<p>Run <code>git status</code> to see where the binary was placed. Run <code>ls objs</code> and
there it is, great:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>ls<span class="w"> </span>objs
autoconf.err<span class="w"> </span>nginx<span class="w"> </span>ngx_auto_config.h<span class="w"> </span>ngx_modules.c<span class="w"> </span>src
Makefile<span class="w"> </span>nginx.8<span class="w"> </span>ngx_auto_headers.h<span class="w"> </span>ngx_modules.o
</pre></div>
<h3 id="the-hack">The hack</h3><p>We want a simple <code>dump</code> command that will return a literal string in a
<code>location</code> block. So something like this:</p>
<div class="highlight"><pre><span></span>$ diff --git a/conf/nginx.conf b/conf/nginx.conf
<span class="gh">index 29bc085f..e96e817f 100644</span>
<span class="gd">--- a/conf/nginx.conf</span>
<span class="gi">+++ b/conf/nginx.conf</span>
<span class="gu">@@ -41,8 +41,7 @@ http {</span>
#access_log logs/host.access.log main;
location / {
<span class="gd">- root html;</span>
<span class="gd">- index index.html index.htm;</span>
<span class="gi">+ dump 'It was a good Thursday.';</span>
<span class="w"> </span> }
<span class="w"> </span> #error_page 404 /404.html;
}
</pre></div>
<p>Now that we've built nginx we can use the <code>-t</code> flag to test the
validity of this config:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>./objs/nginx<span class="w"> </span>-t<span class="w"> </span>-c<span class="w"> </span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/conf/nginx.conf
nginx:<span class="w"> </span><span class="o">[</span>alert<span class="o">]</span><span class="w"> </span>could<span class="w"> </span>not<span class="w"> </span>open<span class="w"> </span>error<span class="w"> </span>log<span class="w"> </span>file:<span class="w"> </span>open<span class="o">()</span><span class="w"> </span><span class="s2">"/usr/local/nginx/logs/error.log"</span><span class="w"> </span>failed<span class="w"> </span><span class="o">(</span><span class="m">2</span>:<span class="w"> </span>No<span class="w"> </span>such<span class="w"> </span>file<span class="w"> </span>or<span class="w"> </span>directory<span class="o">)</span>
<span class="m">2021</span>/04/04<span class="w"> </span><span class="m">21</span>:24:09<span class="w"> </span><span class="o">[</span>emerg<span class="o">]</span><span class="w"> </span><span class="m">1030951</span><span class="c1">#0: unknown directive "dump" in /home/phil/vendor/nginx/conf/nginx.conf:44</span>
nginx:<span class="w"> </span>configuration<span class="w"> </span>file<span class="w"> </span>/home/phil/vendor/nginx/conf/nginx.conf<span class="w"> </span><span class="nb">test</span><span class="w"> </span>failed
</pre></div>
<p>And now we've got something to go on! Clearly we have to register this
directive and the log gives us enough info to start grepping:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span><span class="s1">'unknown directive'</span>
src/core/ngx_conf_file.c:<span class="w"> </span><span class="s2">"unknown directive \"%s\""</span>,<span class="w"> </span>name->data<span class="o">)</span><span class="p">;</span>
</pre></div>
<p>The case that has this failing comes from line 463: <code>rv = cmd->set(cf, cmd, conf)</code>. So let's see what this <code>set</code> does. <code>git grep set</code> is useless. Let's try finding out what <code>cmd</code> is so we can locate the struct that has <code>set</code> on it. Ah it's an <code>ngx_command_t</code>. Since it doesn't have <code>struct</code> behind it it means it's typedef-ed and will likely have a <code>;</code> after it. So <code>git grep ngx_command_t\;</code> finds us:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>ngx_command_t<span class="se">\;</span>
src/core/ngx_core.h:typedef<span class="w"> </span>struct<span class="w"> </span>ngx_command_s<span class="w"> </span>ngx_command_t<span class="p">;</span>
</pre></div>
<p>Which means the implementation is hidden, so grep for ngx_command_s:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>ngx_command_s
src/core/ngx_conf_file.h:struct<span class="w"> </span>ngx_command_s<span class="w"> </span><span class="o">{</span>
src/core/ngx_core.h:typedef<span class="w"> </span>struct<span class="w"> </span>ngx_command_s<span class="w"> </span>ngx_command_t<span class="p">;</span>
</pre></div>
<p>Ok this is going nowhere. Different approach. What command did we remove?</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff
diff<span class="w"> </span>--git<span class="w"> </span>a/conf/nginx.conf<span class="w"> </span>b/conf/nginx.conf
index<span class="w"> </span>29bc085f..e96e817f<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/conf/nginx.conf
+++<span class="w"> </span>b/conf/nginx.conf
@@<span class="w"> </span>-41,8<span class="w"> </span>+41,7<span class="w"> </span>@@<span class="w"> </span>http<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="c1">#access_log logs/host.access.log main;</span>
<span class="w"> </span>location<span class="w"> </span>/<span class="w"> </span><span class="o">{</span>
-<span class="w"> </span>root<span class="w"> </span>html<span class="p">;</span>
-<span class="w"> </span>index<span class="w"> </span>index.html<span class="w"> </span>index.htm<span class="p">;</span>
+<span class="w"> </span>dump<span class="w"> </span><span class="s1">'It was a good Thursday.'</span><span class="p">;</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="c1">#error_page 404 /404.html;</span>
</pre></div>
<p><code>root</code> is a command. Maybe we can copy that.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span><span class="se">\"</span>root<span class="se">\"</span>
docs/xml/nginx/changes.xml:in<span class="w"> </span>the<span class="w"> </span><span class="s2">"root"</span><span class="w"> </span>or<span class="w"> </span><span class="s2">"auth_basic_user_file"</span><span class="w"> </span>directives.
docs/xml/nginx/changes.xml:a<span class="w"> </span>request<span class="w"> </span>was<span class="w"> </span>handled<span class="w"> </span>incorrectly,<span class="w"> </span><span class="k">if</span><span class="w"> </span>a<span class="w"> </span><span class="s2">"root"</span><span class="w"> </span>directive<span class="w"> </span>used<span class="w"> </span>variables<span class="p">;</span>
docs/xml/nginx/changes.xml:the<span class="w"> </span><span class="nv">$document_root</span><span class="w"> </span>variable<span class="w"> </span>usage<span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span><span class="s2">"root"</span><span class="w"> </span>and<span class="w"> </span><span class="s2">"alias"</span><span class="w"> </span>directives
docs/xml/nginx/changes.xml:the<span class="w"> </span><span class="nv">$document_root</span><span class="w"> </span>variable<span class="w"> </span>did<span class="w"> </span>not<span class="w"> </span>support<span class="w"> </span>the<span class="w"> </span>variables<span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span><span class="s2">"root"</span>
docs/xml/nginx/changes.xml:if<span class="w"> </span>a<span class="w"> </span><span class="s2">"root"</span><span class="w"> </span>was<span class="w"> </span>specified<span class="w"> </span>by<span class="w"> </span>variable<span class="w"> </span>only,<span class="w"> </span><span class="k">then</span><span class="w"> </span>the<span class="w"> </span>root<span class="w"> </span>was<span class="w"> </span>relative
src/http/ngx_http_core_module.c:<span class="w"> </span><span class="o">{</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">"root"</span><span class="o">)</span>,
src/http/ngx_http_core_module.c:<span class="w"> </span><span class="p">&</span>cmd->name,<span class="w"> </span>clcf->alias<span class="w"> </span>?<span class="w"> </span><span class="s2">"alias"</span><span class="w"> </span>:<span class="w"> </span><span class="s2">"root"</span><span class="o">)</span><span class="p">;</span>
</pre></div>
<p>That looks more promising. Let's copy that:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src/http/
diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c<span class="w"> </span>index<span class="w"> </span>9b94b328..17a64e80<span class="w"> </span><span class="m">100644</span><span class="w"> </span>---<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>+++<span class="w"> </span>b/src/http/ngx_http_core_module.c<span class="w"> </span>@@<span class="w"> </span>-331,6<span class="w"> </span>+331,14<span class="w"> </span>@@<span class="w"> </span>static<span class="w"> </span>ngx_command_t<span class="w"> </span>ngx_http_core_commands<span class="o">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="m">0</span>,
<span class="w"> </span>NULL<span class="w"> </span><span class="o">}</span>,
+<span class="w"> </span><span class="o">{</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">"dump"</span><span class="o">)</span>,
+<span class="w"> </span>NGX_HTTP_MAIN_CONF<span class="p">|</span>NGX_HTTP_SRV_CONF<span class="p">|</span>NGX_HTTP_LOC_CONF<span class="p">|</span>NGX_HTTP_LIF_CONF
+<span class="w"> </span><span class="p">|</span>NGX_CONF_TAKE1,
+<span class="w"> </span>ngx_http_core_dump,
+<span class="w"> </span>NGX_HTTP_LOC_CONF_OFFSET,
+<span class="w"> </span><span class="m">0</span>,
+<span class="w"> </span>NULL<span class="w"> </span><span class="o">}</span>,
+
<span class="w"> </span><span class="o">{</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">"alias"</span><span class="o">)</span>,
<span class="w"> </span>NGX_HTTP_LOC_CONF<span class="p">|</span>NGX_CONF_TAKE1,
<span class="w"> </span>ngx_http_core_root,
</pre></div>
<p>Ok so this is how a command is registered. It obviously won't build without <code>ngx_http_core_dump</code> so let's implement that by copying/renaming <code>ngx_http_core_root</code>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src
diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c
index<span class="w"> </span>9b94b328..c184dab5<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/src/http/ngx_http_core_module.c
+++<span class="w"> </span>b/src/http/ngx_http_core_module.c
@@<span class="w"> </span>-4402,6<span class="w"> </span>+4410,16<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_root<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span>
<span class="o">}</span>
+static<span class="w"> </span>char<span class="w"> </span>*
+ngx_http_core_dump<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span>
+<span class="o">{</span>
+<span class="w"> </span>ngx_http_core_loc_conf_t<span class="w"> </span>*clcf<span class="w"> </span><span class="o">=</span><span class="w"> </span>conf<span class="p">;</span>
+<span class="w"> </span>ngx_str_t<span class="w"> </span>*value<span class="w"> </span><span class="o">=</span><span class="w"> </span>cf->args->elts<span class="p">;</span>
+<span class="w"> </span>clcf->dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span>
+<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_CONF_OK<span class="p">;</span>
+<span class="o">}</span>
+
+
static<span class="w"> </span>ngx_http_method_name_t<span class="w"> </span>ngx_methods_names<span class="o">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="o">{</span><span class="w"> </span><span class="o">(</span>u_char<span class="w"> </span>*<span class="o">)</span><span class="w"> </span><span class="s2">"GET"</span>,<span class="w"> </span><span class="o">(</span>uint32_t<span class="o">)</span><span class="w"> </span>~NGX_HTTP_GET<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="o">{</span><span class="w"> </span><span class="o">(</span>u_char<span class="w"> </span>*<span class="o">)</span><span class="w"> </span><span class="s2">"HEAD"</span>,<span class="w"> </span><span class="o">(</span>uint32_t<span class="o">)</span><span class="w"> </span>~NGX_HTTP_HEAD<span class="w"> </span><span class="o">}</span>,
</pre></div>
<p>The goal here is to just store the dump string on this conf
object. Then while serving the request we can check if this is set and
if so, respond to the request with this string.</p>
<p>This still clearly won't build because we didn't modify this conf
object. But let's run <code>make</code> anyway.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>make<span class="w"> </span>-f<span class="w"> </span>objs/Makefile
make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Entering<span class="w"> </span>directory<span class="w"> </span><span class="s1">'/home/phil/vendor/nginx'</span>
cc<span class="w"> </span>-c<span class="w"> </span>-pipe<span class="w"> </span>-O<span class="w"> </span>-W<span class="w"> </span>-Wall<span class="w"> </span>-Wpointer-arith<span class="w"> </span>-Wno-unused-parameter<span class="w"> </span>-Werror<span class="w"> </span>-g<span class="w"> </span>-I<span class="w"> </span>src/core<span class="w"> </span>-I<span class="w"> </span>src/event<span class="w"> </span>-I<span class="w"> </span>src/event/modules<span class="w"> </span>-I<span class="w"> </span>src/os/unix<span class="w"> </span>-I<span class="w"> </span>objs<span class="w"> </span>-I<span class="w"> </span>src/http<span class="w"> </span>-I<span class="w"> </span>src/http/modules<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-o<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>src/http/ngx_http_core_module.c
src/http/ngx_http_core_module.c:337:7:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_dump<span class="w"> </span>undeclared<span class="w"> </span>here<span class="w"> </span><span class="o">(</span>not<span class="w"> </span><span class="k">in</span><span class="w"> </span>a<span class="w"> </span><span class="k">function</span><span class="o">)</span><span class="p">;</span><span class="w"> </span>did<span class="w"> </span>you<span class="w"> </span>mean<span class="w"> </span>ngx_http_core_type?
<span class="m">337</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>ngx_http_core_dump,
<span class="w"> </span><span class="p">|</span><span class="w"> </span>^~~~~~~~~~~~~~~~~~~~~
<span class="w"> </span><span class="p">|</span><span class="w"> </span>ngx_http_core_type
src/http/ngx_http_core_module.c:<span class="w"> </span>In<span class="w"> </span><span class="k">function</span><span class="w"> </span>ngx_http_core_dump:
src/http/ngx_http_core_module.c:4418:9:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_loc_conf_t<span class="w"> </span><span class="o">{</span>aka<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="o">}</span><span class="w"> </span>has<span class="w"> </span>no<span class="w"> </span>member<span class="w"> </span>named<span class="w"> </span>dump
<span class="m">4418</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>clcf->dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span>
<span class="w"> </span><span class="p">|</span><span class="w"> </span>^~
src/http/ngx_http_core_module.c:4418:5:<span class="w"> </span>error:<span class="w"> </span>statement<span class="w"> </span>with<span class="w"> </span>no<span class="w"> </span>effect<span class="w"> </span><span class="o">[</span>-Werror<span class="o">=</span>unused-value<span class="o">]</span>
<span class="m">4418</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>clcf->dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span>
<span class="w"> </span><span class="p">|</span><span class="w"> </span>^~~~
At<span class="w"> </span>top<span class="w"> </span>level:
src/http/ngx_http_core_module.c:4414:1:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_dump<span class="w"> </span>defined<span class="w"> </span>but<span class="w"> </span>not<span class="w"> </span>used<span class="w"> </span><span class="o">[</span>-Werror<span class="o">=</span>unused-function<span class="o">]</span>
<span class="m">4414</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>ngx_http_core_dump<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span>
<span class="w"> </span><span class="p">|</span><span class="w"> </span>^~~~~~~~~~~~~~~~~~~~~
cc1:<span class="w"> </span>all<span class="w"> </span>warnings<span class="w"> </span>being<span class="w"> </span>treated<span class="w"> </span>as<span class="w"> </span>errors
make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>objs/Makefile:834:<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">1</span>
make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Leaving<span class="w"> </span>directory<span class="w"> </span><span class="s1">'/home/phil/vendor/nginx'</span>
make:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>Makefile:10:<span class="w"> </span>build<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">2</span>
</pre></div>
<p>The dump handler is undeclared. While copying <code>ngx_http_core_root</code> earlier I saw that there was a forward declaration toward the top. Let's copy that as well and see if that fixes anything.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff
diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c
index<span class="w"> </span>9b94b328..430e1256<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/src/http/ngx_http_core_module.c
+++<span class="w"> </span>b/src/http/ngx_http_core_module.c
@@<span class="w"> </span>-56,6<span class="w"> </span>+56,7<span class="w"> </span>@@<span class="w"> </span>static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_listen<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,
static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_server_name<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,
<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span>
static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_root<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span>
+static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_dump<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span>
<span class="w"> </span>static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_limit_except<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,
<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span>
<span class="w"> </span>static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_set_aio<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,
</pre></div>
<p>And build:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>make
make<span class="w"> </span>-f<span class="w"> </span>objs/Makefile
make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Entering<span class="w"> </span>directory<span class="w"> </span><span class="s1">'/home/phil/vendor/nginx'</span>
cc<span class="w"> </span>-c<span class="w"> </span>-pipe<span class="w"> </span>-O<span class="w"> </span>-W<span class="w"> </span>-Wall<span class="w"> </span>-Wpointer-arith<span class="w"> </span>-Wno-unused-parameter<span class="w"> </span>-Werror<span class="w"> </span>-g<span class="w"> </span>-I<span class="w"> </span>src/core<span class="w"> </span>-I<span class="w"> </span>src/event<span class="w"> </span>-I<span class="w"> </span>src/event/modules<span class="w"> </span>-I<span class="w"> </span>src/os/unix<span class="w"> </span>-I<span class="w"> </span>objs<span class="w"> </span>-I<span class="w"> </span>src/http<span class="w"> </span>-I<span class="w"> </span>src/http/modules<span class="w"> </span><span class="se">\</span>
<span class="w"> </span>-o<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="w"> </span><span class="se">\</span>
src/http/ngx_http_core_module.c
src/http/ngx_http_core_module.c:<span class="w"> </span>In<span class="w"> </span><span class="k">function</span><span class="w"> </span>ngx_http_core_dump:
src/http/ngx_http_core_module.c:4419:9:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_loc_conf_t<span class="w"> </span><span class="o">{</span>aka<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="o">}</span><span class="w"> </span>has<span class="w"> </span>no<span class="w"> </span>member<span class="w"> </span>named<span class="w"> </span>dump
<span class="m">4419</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>clcf->dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span>
<span class="w"> </span><span class="p">|</span><span class="w"> </span>^~
make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>objs/Makefile:834:<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">1</span>
make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Leaving<span class="w"> </span>directory<span class="w"> </span><span class="s1">'/home/phil/vendor/nginx'</span>
make:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>Makefile:10:<span class="w"> </span>build<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">2</span>
</pre></div>
<p>Perfect. Now let's add <code>dump</code> as a member to this conf object.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>ngx_http_core_loc_conf_t<span class="se">\;</span>
src/http/ngx_http_core_module.h:typedef<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="w"> </span>ngx_http_core_loc_conf_t<span class="p">;</span>
</pre></div>
<p>Let's just clone the <code>root</code> member:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.h<span class="w"> </span>b/src/http/ngx_http_core_module.h
index<span class="w"> </span>2aadae7f..6b1b178b<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/src/http/ngx_http_core_module.h
+++<span class="w"> </span>b/src/http/ngx_http_core_module.h
@@<span class="w"> </span>-333,6<span class="w"> </span>+333,7<span class="w"> </span>@@<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="w"> </span><span class="o">{</span>
/*<span class="w"> </span>location<span class="w"> </span>name<span class="w"> </span>length<span class="w"> </span><span class="k">for</span><span class="w"> </span>inclusive<span class="w"> </span>location<span class="w"> </span>with<span class="w"> </span>inherited<span class="w"> </span><span class="nb">alias</span><span class="w"> </span>*/
<span class="w"> </span>size_t<span class="w"> </span>alias<span class="p">;</span>
<span class="w"> </span>ngx_str_t<span class="w"> </span>root<span class="p">;</span><span class="w"> </span>/*<span class="w"> </span>root,<span class="w"> </span><span class="nb">alias</span><span class="w"> </span>*/
+<span class="w"> </span>ngx_str_t<span class="w"> </span>dump<span class="p">;</span>
<span class="w"> </span>ngx_str_t<span class="w"> </span>post_action<span class="p">;</span>
<span class="w"> </span>ngx_array_t<span class="w"> </span>*root_lengths<span class="p">;</span>
</pre></div>
<p>Run <code>make</code> and it succeeds!</p>
<p>Now we spend a few hours looking around for a good place to add a hook
during a request. Ultimately, <code>ngx_http_core_find_config_phase</code> seems
like a good place since only then will we be dealing with the struct
we added <code>dump</code> to.</p>
<p>Next step is figuring out how to send a response. Grepping for
<code>response</code> isn't super useful, neither is <code>write</code>. But <code>send</code> has some
pretty low-level but obvious behavior.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>send<span class="se">\(</span>
src/mail/ngx_mail.h:void<span class="w"> </span>ngx_mail_send<span class="o">(</span>ngx_event_t<span class="w"> </span>*wev<span class="o">)</span><span class="p">;</span>
src/mail/ngx_mail_auth_http_module.c:<span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ngx_send<span class="o">(</span>c,<span class="w"> </span>ctx->request->pos,<span class="w"> </span>size<span class="o">)</span><span class="p">;</span><span class="o">)</span>
...
</pre></div>
<p>That second result looks promising. Looking at that file it looks like
we need an object that has a <code>->data</code> member. In
<code>src/http/ngx_http_core_module.c</code> I noticed that the request object
has a member that looks interesting: <code>r->connection->write->data</code>. And
if we look up the signature we just need to also send <code>ngx_send</code> a
string and a length.</p>
<p>Thankfully we already have that from our <code>dump</code> member. So let's try something simple:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff
diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c
index<span class="w"> </span>9b94b328..bd58788b<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/src/http/ngx_http_core_module.c
+++<span class="w"> </span>b/src/http/ngx_http_core_module.c
@@<span class="w"> </span>-989,6<span class="w"> </span>+996,11<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_find_config_phase<span class="o">(</span>ngx_http_request_t<span class="w"> </span>*r,
<span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_HTTP_REQUEST_ENTITY_TOO_LARGE<span class="o">)</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span>
<span class="o">}</span>
+
+<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span>clcf->dump.len<span class="o">)</span><span class="w"> </span><span class="o">{</span>
+<span class="w"> </span>ngx_send<span class="o">(</span>r->connection->write->data,<span class="w"> </span>clcf->dump.data,<span class="w"> </span>clcf->dump.len<span class="o">)</span><span class="p">;</span>
+<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span>
+<span class="w"> </span><span class="o">}</span>
</pre></div>
<p>Run <code>make</code> and it's good! Let's turn off the nginx daemon and worker processes so it's easier to quit as we're iterating.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>conf/
diff<span class="w"> </span>--git<span class="w"> </span>a/conf/nginx.conf<span class="w"> </span>b/conf/nginx.conf
index<span class="w"> </span>29bc085f..7cce7d65<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/conf/nginx.conf
+++<span class="w"> </span>b/conf/nginx.conf
@@<span class="w"> </span>-1,4<span class="w"> </span>+1,5<span class="w"> </span>@@
-
+daemon<span class="w"> </span>off<span class="p">;</span>
+master_process<span class="w"> </span>off<span class="p">;</span>
<span class="w"> </span><span class="c1">#user nobody;</span>
<span class="w"> </span>worker_processes<span class="w"> </span><span class="m">1</span><span class="p">;</span>
</pre></div>
<p>Now run <code>./objs/nginx -c $(pwd)/conf/nginx.conf</code>. Try to curl:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:2020
curl:<span class="w"> </span><span class="o">(</span><span class="m">1</span><span class="o">)</span><span class="w"> </span>Received<span class="w"> </span>HTTP/0.9<span class="w"> </span>when<span class="w"> </span>not<span class="w"> </span>allowed
</pre></div>
<p>Huh, that's unexpected. Let's try using telnet to get the whole raw
response:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>telnet<span class="w"> </span>localhost<span class="w"> </span><span class="m">2020</span>
Trying<span class="w"> </span>::1...
telnet:<span class="w"> </span>connect<span class="w"> </span>to<span class="w"> </span>address<span class="w"> </span>::1:<span class="w"> </span>Connection<span class="w"> </span>refused
Trying<span class="w"> </span><span class="m">127</span>.0.0.1...
Connected<span class="w"> </span>to<span class="w"> </span>localhost.
Escape<span class="w"> </span>character<span class="w"> </span>is<span class="w"> </span><span class="s1">'^]'</span>.
GET<span class="w"> </span>/
It<span class="w"> </span>was<span class="w"> </span>a<span class="w"> </span>good<span class="w"> </span>Thursday.
</pre></div>
<p>Oh man. That's super cool. Unfortunately it's also not valid HTTP. It
seems like if we're using <code>ngx_send</code> we'll have to set the HTTP
response headers manually.</p>
<p>If we're going to pass a literal string to <code>ngx_send</code> we'll have to
convert it to an <code>ngx_str_t</code>. Judging from <code>src/core/ngx_string.h</code> the
<code>ngx_string</code> macro should be able to do this.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src
diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c
index<span class="w"> </span>9b94b328..1a1baccd<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/src/http/ngx_http_core_module.c
+++<span class="w"> </span>b/src/http/ngx_http_core_module.c
@@<span class="w"> </span>-989,6<span class="w"> </span>+996,13<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_find_config_phase<span class="o">(</span>ngx_http_request_t<span class="w"> </span>*r,
<span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_HTTP_REQUEST_ENTITY_TOO_LARGE<span class="o">)</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span>
<span class="w"> </span><span class="o">}</span>
+
+<span class="w"> </span>static<span class="w"> </span>ngx_str_t<span class="w"> </span><span class="nv">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">"HTTP/1.0 200 OK\r\n\r\n"</span><span class="o">)</span><span class="p">;</span>
+<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span>clcf->dump.len<span class="o">)</span><span class="w"> </span><span class="o">{</span>
+<span class="w"> </span>ngx_send<span class="o">(</span>r->connection->write->data,<span class="w"> </span>header.data,<span class="w"> </span>header.len<span class="o">)</span><span class="p">;</span>
+<span class="w"> </span>ngx_send<span class="o">(</span>r->connection->write->data,<span class="w"> </span>clcf->dump.data,<span class="w"> </span>clcf->dump.len<span class="o">)</span><span class="p">;</span>
+<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span>
+<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span><span class="nv">rc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>NGX_DONE<span class="o">)</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>ngx_http_clear_location<span class="o">(</span>r<span class="o">)</span><span class="p">;</span>
<span class="o">}</span>
</pre></div>
<p>Compile, run and curl:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:2020
</pre></div>
<p>Huh. It's no longer complaining about HTTP/0.9 but it's now
hanging. Let's try verbose curling.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-vvv<span class="w"> </span>localhost:2020
*<span class="w"> </span>Trying<span class="w"> </span>::1:2020...
*<span class="w"> </span>connect<span class="w"> </span>to<span class="w"> </span>::1<span class="w"> </span>port<span class="w"> </span><span class="m">2020</span><span class="w"> </span>failed:<span class="w"> </span>Connection<span class="w"> </span>refused
*<span class="w"> </span>Trying<span class="w"> </span><span class="m">127</span>.0.0.1:2020...
*<span class="w"> </span>Connected<span class="w"> </span>to<span class="w"> </span>localhost<span class="w"> </span><span class="o">(</span><span class="m">127</span>.0.0.1<span class="o">)</span><span class="w"> </span>port<span class="w"> </span><span class="m">2020</span><span class="w"> </span><span class="o">(</span><span class="c1">#0)</span>
><span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1
><span class="w"> </span>Host:<span class="w"> </span>localhost:2020
><span class="w"> </span>User-Agent:<span class="w"> </span>curl/7.71.1
><span class="w"> </span>Accept:<span class="w"> </span>*/*
>
*<span class="w"> </span>Mark<span class="w"> </span>bundle<span class="w"> </span>as<span class="w"> </span>not<span class="w"> </span>supporting<span class="w"> </span>multiuse
*<span class="w"> </span>HTTP<span class="w"> </span><span class="m">1</span>.0,<span class="w"> </span>assume<span class="w"> </span>close<span class="w"> </span>after<span class="w"> </span>body
<<span class="w"> </span>HTTP/1.0<span class="w"> </span><span class="m">200</span><span class="w"> </span>OK
</pre></div>
<p>That's really weird. But I noticed there was a
<code>ngx_http_request_finalize</code> function that other parts of the code were
calling. Let's try adding that.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src
diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c
index<span class="w"> </span>9b94b328..1a1baccd<span class="w"> </span><span class="m">100644</span>
---<span class="w"> </span>a/src/http/ngx_http_core_module.c
+++<span class="w"> </span>b/src/http/ngx_http_core_module.c
@@<span class="w"> </span>-989,6<span class="w"> </span>+996,14<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_find_config_phase<span class="o">(</span>ngx_http_request_t<span class="w"> </span>*r,
<span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_HTTP_REQUEST_ENTITY_TOO_LARGE<span class="o">)</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span>
<span class="w"> </span><span class="o">}</span>
+
+<span class="w"> </span>static<span class="w"> </span>ngx_str_t<span class="w"> </span><span class="nv">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">"HTTP/1.0 200 OK\r\n\r\n"</span><span class="o">)</span><span class="p">;</span>
+<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span>clcf->dump.len<span class="o">)</span><span class="w"> </span><span class="o">{</span>
+<span class="w"> </span>ngx_send<span class="o">(</span>r->connection->write->data,<span class="w"> </span>header.data,<span class="w"> </span>header.len<span class="o">)</span><span class="p">;</span>
+<span class="w"> </span>ngx_send<span class="o">(</span>r->connection->write->data,<span class="w"> </span>clcf->dump.data,<span class="w"> </span>clcf->dump.len<span class="o">)</span><span class="p">;</span>
+<span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_DONE<span class="o">)</span><span class="p">;</span>
+<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span>
+<span class="w"> </span><span class="o">}</span>
</pre></div>
<p>Build, run, curl. Still hanging. Looking into the source code of
<code>ngx_http_finalize_request</code> it seems like there's a case where the
connection is completely closed if you pass in <code>NGX_HTTP_CLOSE</code>. Let's
try that.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:2020
It<span class="w"> </span>was<span class="w"> </span>a<span class="w"> </span>good<span class="w"> </span>Thursday.
</pre></div>
<p>Well hot dog, it works.</p>
<h3 id="reflection">Reflection</h3><p>Is this a good way to implement commands in nginx? No. While I knew a
bit about nginx modules as a user it's clear that as a developer this
command could have been implemented much more cleanly as a module too.</p>
<p>There also has to be higher-level tooling for returning constructing
responses rather than writing out headers manually.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Been wanting to write some posts like this for a long time showing some techniques for hacking on an unfamiliar project using very basic programming/Linux tools. In this post it's nginx<a href="https://t.co/t7Y43Zmxhk">https://t.co/t7Y43Zmxhk</a> <a href="https://t.co/EOatURm5wx">pic.twitter.com/EOatURm5wx</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1378906317004361732?ref_src=twsrc%5Etfw">April 5, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/learning-a-new-codebase-hacking-nginx.htmlSun, 04 Apr 2021 00:00:00 +0000
- How to get better at recursionhttp://notes.eatonphil.com/practicing-recursion.html<p>tldr; reimplement standard library functions in your favorite
language <em>without loops</em>.</p>
<h3 id="background">Background</h3><p>For a few years after college I spent a lot of free time doing
projects in Standard ML and Scheme. As a result I got really
comfortable doing recursion. The two big reasons for this are 1)
neither Standard ML or Scheme have loops and 2) they both have very
small standard libraries. (Ok, they have loops. They're just so
limited as to be useless.)</p>
<p>I ended up building <a href="https://github.com/eatonphil/ponyo">a standard
library</a> for Standard ML including
string functions (contains, indexOf, count, replace, etc.), an HTTP
server and client, a hash table, a binary search tree, parts of a
Standard ML parser, and <a href="https://ponyo.org/reference">so on</a>.</p>
<p>All of this without loops.</p>
<h3 id="strategy">Strategy</h3><p>The good news (if you don't want to learn a new language) is that you
don't have to take up Standard ML or Scheme to get better at
recursion. But you do need to dedicate some time to <em>practicing
recursion</em> to get better at it.</p>
<p>My recommendation would be to pick 10-20 string or array functions out
of your favorite language's standard library and reimplement them
without loops. (Obviously, start simple and just pick one. But
don't stop there.)</p>
<h3 id="some-examples">Some examples</h3><p>Here's an example reimplementation of <code>indexOf</code> in
JavaScript:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">indexOf</span><span class="p">(</span><span class="nx">input</span><span class="p">,</span><span class="w"> </span><span class="nx">toMatch</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="p">,</span><span class="w"> </span><span class="nx">test</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">index</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">input</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">toMatch</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">test</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">index</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="o">+</span><span class="nx">offset</span><span class="p">]</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="nx">toMatch</span><span class="p">[</span><span class="nx">offset</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">test</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="nx">toMatch</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="nx">index</span><span class="o">+</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="o">+</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="nx">test</span><span class="o">+</span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="o">+</span><span class="nx">offset</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Or here's an example immutable reimplementation of <code>insert</code>
in Python:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">item</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="p">,</span> <span class="n">accum</span><span class="p">):</span>
<span class="k">if</span> <span class="n">currentIndex</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">arr</span><span class="p">):</span>
<span class="k">return</span> <span class="n">accum</span>
<span class="k">if</span> <span class="n">currentIndex</span> <span class="o"><</span> <span class="n">index</span><span class="p">:</span>
<span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">accum</span> <span class="o">+</span> <span class="p">[</span><span class="n">arr</span><span class="p">[</span><span class="n">currentIndex</span><span class="p">]])</span>
<span class="k">if</span> <span class="n">currentIndex</span> <span class="o">==</span> <span class="n">index</span><span class="p">:</span>
<span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">accum</span> <span class="o">+</span> <span class="p">[</span><span class="n">item</span><span class="p">,</span> <span class="n">arr</span><span class="p">[</span><span class="n">currentIndex</span><span class="p">]])</span>
<span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">accum</span> <span class="o">+</span> <span class="p">[</span><span class="n">arr</span><span class="p">[</span><span class="n">currentIndex</span><span class="p">]])</span>
<span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[])</span>
</pre></div>
<p class="note">
You're going to find an edge case and that's alright. The
important part at the moment is practicing recursion.
</p><p>For bonus points, avoid all mutation in your implementations and use
only tail recursion.</p>
<p>Happy recursion!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Reimplementing standard library functions without for loops is a great way to get better at recursion and you don't need to use a functional programming language to do so<a href="https://t.co/JiPnXMQW3l">https://t.co/JiPnXMQW3l</a> <a href="https://t.co/MHwX5t70HT">pic.twitter.com/MHwX5t70HT</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1368602496168497154?ref_src=twsrc%5Etfw">March 7, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/practicing-recursion.htmlSun, 07 Mar 2021 00:00:00 +0000
- Extending gosql to supporting LIMIT and OFFSEThttp://notes.eatonphil.com/extending-gosql-to-support-limit-and-offset.html<p>It's been a few months since I picked up
<a href="https://github.com/eatonphil/gosql">gosql</a> and I wanted to use it to
prototype a SQL interface for data stored in S3. But one missing
critical feature in gosql is <code>LIMIT</code> and <code>OFFSET</code> support. This post walks
through the few key changes to gosql to support <code>LIMIT</code> and <code>OFFSET</code>.</p>
<p>You can find <a href="https://github.com/eatonphil/gosql/commit/9405e433ec51f8f1d72c9b2e8f45109d738edec4">this commit in full on
Github</a>.</p>
<p class="note">
This post builds on top of a series on building a SQL database from scratch in Golang.
<! forgive me, for I have sinned >
<br />
<a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a>
<br />
<a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a>
<br />
<a href="/database-basics-indexes.html">3. indexes</a>
<br />
<a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a>
</p><h3 id="lexing">Lexing</h3><p>The first step is to update the lexer to know about the
<code>LIMIT</code> and <code>OFFSET</code> keywords. Since we already
have a generalized method of lexing any keywords from an array (see
<code>lexer.go:lexKeyword</code>), this is really easy. Just add a new
<code>Keyword</code>:</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">37</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">37</span><span class="p">,</span><span class="mi">8</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">OnKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"on"</span>
<span class="w"> </span><span class="nx">PrimarykeyKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"primary key"</span>
<span class="w"> </span><span class="nx">NullKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"null"</span>
<span class="o">+</span><span class="w"> </span><span class="nx">LimitKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"limit"</span>
<span class="o">+</span><span class="w"> </span><span class="nx">OffsetKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"offset"</span>
<span class="w"> </span><span class="p">)</span>
</pre></div>
<p>And then add these two new enums to the list of <code>Keyword</code>s
to lex:</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">261</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">263</span><span class="p">,</span><span class="mi">8</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="nx">lexKeyword</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">OnKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">PrimarykeyKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">NullKeyword</span><span class="p">,</span>
<span class="o">+</span><span class="w"> </span><span class="nx">LimitKeyword</span><span class="p">,</span>
<span class="o">+</span><span class="w"> </span><span class="nx">OffsetKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
</pre></div>
<p>That's it for the lexer.</p>
<h3 id="parsing">Parsing</h3><p>Before we can parse limit and offset into the AST, we have to modify
our AST struct to support these two fields in ast.go:</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">54</span><span class="p">,</span><span class="mi">9</span><span class="w"> </span><span class="o">+</span><span class="mi">54</span><span class="p">,</span><span class="mi">11</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">SelectItem</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="o">-</span><span class="w"> </span><span class="nx">Item</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">SelectItem</span>
<span class="o">-</span><span class="w"> </span><span class="nx">From</span><span class="w"> </span><span class="o">*</span><span class="nx">Token</span>
<span class="o">-</span><span class="w"> </span><span class="nx">Where</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span>
<span class="o">+</span><span class="w"> </span><span class="nx">Item</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">SelectItem</span>
<span class="o">+</span><span class="w"> </span><span class="nx">From</span><span class="w"> </span><span class="o">*</span><span class="nx">Token</span>
<span class="o">+</span><span class="w"> </span><span class="nx">Where</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span>
<span class="o">+</span><span class="w"> </span><span class="nx">Limit</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span>
<span class="o">+</span><span class="w"> </span><span class="nx">Offset</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And to be a good citizen, we'll fix up the <code>GenerateCode</code>
helper function (for pretty-printing the AST) to
show <code>LIMIT</code> and <code>OFFSET</code>.</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">73</span><span class="p">,</span><span class="mi">17</span><span class="w"> </span><span class="o">+</span><span class="mi">75</span><span class="p">,</span><span class="mi">24</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">ss</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="nx">GenerateCode</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">item</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">-</span><span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"SELECT\n"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="s">",\n"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">From</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">-</span><span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"\nFROM\n\t\"%s\""</span><span class="p">,</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">From</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"\nFROM\n\t\"%s\""</span><span class="p">,</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">From</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">-</span><span class="w"> </span><span class="nx">where</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Where</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">-</span><span class="w"> </span><span class="nx">where</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"\nWHERE\n\t%s"</span><span class="p">,</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Where</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">())</span>
<span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"\nWHERE\n\t"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Where</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">-</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"SELECT\n%s%s%s;"</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="s">",\n"</span><span class="p">),</span><span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">where</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Limit</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"\nLIMIT\n\t"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Limit</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">()</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Offset</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">"\nOFFSET\n\t"</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Limit</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">()</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">";"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">ColumnDefinition</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
</pre></div>
<p>That's it for modifying the AST itself. Now we can modify the select
statement parser to look for these two new sections. It's pretty
simple: for both <code>LIMIT</code> and <code>OFFSET</code> first
check if they exist in the current statement and then try to parse the
expression after them, in parser.go:</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">285</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">288</span><span class="p">,</span><span class="mi">30</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="nx">Parser</span><span class="p">)</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">Token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimi</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">+</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">limitToken</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">limit</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">offsetToken</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected LIMIT value"</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Limit</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">limit</span>
<span class="o">+</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">offsetToken</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">offset</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected OFFSET value"</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Offset</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">offset</span>
<span class="o">+</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And the last tricky bit is to make sure that previous
optional <code>parseExpression</code> know that they can be delimited
by <code>OFFSET</code> and <code>LIMIT</code> (this delimiter
awareness is just how the parser works):</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">273</span><span class="p">,</span><span class="mi">9</span><span class="w"> </span><span class="o">+</span><span class="mi">273</span><span class="p">,</span><span class="mi">12</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="nx">Parser</span><span class="p">)</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">Token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimi</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">+</span><span class="w"> </span><span class="nx">limitToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">LimitKeyword</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="nx">offsetToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">OffsetKeyword</span><span class="p">)</span>
<span class="o">+</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">whereToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="o">-</span><span class="w"> </span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">limitToken</span><span class="p">,</span><span class="w"> </span><span class="nx">offsetToken</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected WHERE conditionals"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
</pre></div>
<p>That's it for parsing!</p>
<h3 id="runtime">Runtime</h3><p>Gosql has just one storage backend currently: an in-memory store. To
support <code>LIMIT</code> and <code>OFFSET</code> we need to evaluate
both expressions if they exist. Then while we're iterating through
table rows, after testing whether each row passes
the <code>WHERE</code> filter, we'll check if the number of rows
passing the <code>WHERE</code> filter falls within the range
of <code>OFFSET</code> and <code>LIMIT + OFFSET</code> otherwise we'll
skip the row, in memory.go:</p>
<div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">587</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">587</span><span class="p">,</span><span class="mi">33</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">+</span><span class="w"> </span><span class="nx">limit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Limit</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">Limit</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">limit</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">v</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">limit</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Invalid, negative limit"</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Offset</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">Offset</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">v</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Invalid, negative limit"</span><span class="p">)</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="o">+</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span>
<span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">602</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">629</span><span class="p">,</span><span class="mi">13</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="o">+</span><span class="w"> </span><span class="nx">rowIndex</span><span class="o">++</span>
<span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="k">continue</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nx">offset</span><span class="o">+</span><span class="nx">limit</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="o">+</span><span class="w"> </span><span class="k">break</span>
<span class="o">+</span><span class="w"> </span><span class="p">}</span>
<span class="o">+</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">finalItems</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">col</span><span class="p">.</span><span class="nx">Exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
</pre></div>
<p class="note">
Just to call out explicitly, with <code>LIMIT</code>
and <code>OFFSET</code> we still have to check every single row in
the table (at least until we've reached the offset). This should
clearly illustrate why paginating based on <code>LIMIT</code>
and <code>OFFSET</code> is not a great idea for big datasets
<a href="https://use-the-index-luke.com/sql/partial-results/fetch-next-page">compared
to index-based pagination</a>.
</p><p>That's all!</p>
<h3 id="trying-it-out">Trying it out</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>cmd/main.go
$<span class="w"> </span>./main
Welcome<span class="w"> </span>to<span class="w"> </span>gosql.
<span class="c1"># create table user (name text, age int);</span>
ok
<span class="c1"># insert into user values ('meg', 2);</span>
ok
<span class="c1"># insert into user values ('jerry', 2);</span>
ok
<span class="c1"># insert into user values ('phil', 1);</span>
ok
<span class="c1"># select * from user;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age
--------+------
<span class="w"> </span>meg<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span>
<span class="w"> </span>jerry<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span>
<span class="w"> </span>phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>
<span class="o">(</span><span class="m">3</span><span class="w"> </span>results<span class="o">)</span>
ok
<span class="c1"># select * from user limit 1;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age
-------+------
<span class="w"> </span>meg<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span>
ok
<span class="c1"># select * from user where age=1 limit 1;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age
-------+------
<span class="w"> </span>phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span>
ok
<span class="c1"># select * from user where age=1 limit 4;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age
-------+------
<span class="w"> </span>phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span>
ok
<span class="c1"># select * from user where age=2 limit 1;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age
-------+------
<span class="w"> </span>meg<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span>
ok
<span class="c1"># select * from user where age=2 limit 1 offset 1;</span>
<span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age
--------+------
<span class="w"> </span>jerry<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span>
<span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span>
ok
</pre></div>
<p>Not so hard to hack is it? Make sure to include some tests!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Working on a prototype SQL-based explorer for data stored in S3 and I needed OFFSET/LIMIT support in the gosql parser. Wrote up a short post on how you can hack in additional syntax and functionality into this SQL engine written in Go.<a href="https://t.co/PyVozTPZ5S">https://t.co/PyVozTPZ5S</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1353372050023456768?ref_src=twsrc%5Etfw">January 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/extending-gosql-to-support-limit-and-offset.htmlSat, 23 Jan 2021 00:00:00 +0000
- The year in books: 20 to recommend in 2020http://notes.eatonphil.com/year-in-books-2020.html<p>This year I finished 47 books, up from last year but not a personal
best. The breakdown was 17 non-fiction and 30 fiction. Another 20-30
remain started but unfinished this year.</p>
<h3 id="non-fiction">Non-fiction</h3><p>The 8 non-fiction books I most recommend are:</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/51034048-fashionopolis">Fashionapolis: The Price of Fast Fashion and the Future of Clothes</a> (Must read)</li>
<li><a href="https://www.goodreads.com/book/show/48566725-effective-python">Effective Python: 90 Specific Ways to Write Better Python</a> (Must read; truly excellent for Python programmers, I recommend this to anyone I work with)</li>
<li><a href="https://www.goodreads.com/book/show/93904.The_Machine_That_Changed_the_World">The Machine that Changed the World</a> (Must read)</li>
<li><a href="https://www.goodreads.com/book/show/16043511-europe">Europe: The Struggle for Supremacy from 1453 to the Present</a></li>
<li><a href="https://www.goodreads.com/book/show/19606799-wind-sand-and-stars">Wind, Sand and Stars</a></li>
<li><a href="https://www.goodreads.com/book/show/11169043-american-colossus">American Colussus: The Triumph of Capitalism, 1865-1900</a></li>
<li><a href="https://www.goodreads.com/book/show/2360599.Making_Common_Sense_of_Japan">Making Common Sense of Japan</a></li>
<li><a href="https://www.goodreads.com/book/show/8155672-the-german-genius">The German Genius</a></li>
</ul>
<p>The 3 books I recommend you not to waste time on are: "The Two
Koreas", "The Price of Inequality", and "Ninety Percent of Everything:
Inside Shipping".</p>
<h4 id="the-whole-list">The whole list</h4><ul>
<li><a href="https://www.goodreads.com/book/show/235560.The_Two_Koreas">The Two Koreas</a> by Don Oberdorfer<ul>
<li>Interesting but not a huge fan, seemed pretty biased against South Korea somehow</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/88546.Forbidden_Nation">Forbidden Nation: A History of Taiwan</a> by Jonathan Manthorpe</li>
<li><a href="https://www.goodreads.com/book/show/48566725-effective-python">Effective Python: 90 Specific Ways to Write Better Python</a> by Brett Slatkin</li>
<li><a href="https://www.goodreads.com/book/show/43701534-a-philosophy-of-software-design">A Philosophy of Software Design</a> by John Ousterhout<ul>
<li>Came as a recommendation from someone on Twitter, ultimately not a huge fan. Still looking for high quality books on software design</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/16031130-the-price-of-inequality">The Price of Inequality</a> by Joseph E. Stiglitz<ul>
<li>Agreed with the premise but the book was incoherent and too self-assuring</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/18930203-paris-reborn">Paris Reborn: Napoléon III, Baron Haussmann, and the Quest to Build a Modern City</a> by Stephane Kirkland</li>
<li><a href="https://www.goodreads.com/book/show/51034048-fashionopolis">Fashionapolis: The Price of Fast Fashion and the Future of Clothes</a> by Dana Thomas</li>
<li><a href="https://www.goodreads.com/book/show/6603103-a-moveable-feast">A Moveable Feast</a> by Ernest Hemingway<ul>
<li>I normally love Hemingway's writing but this particular book was not very coherent</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/16043511-europe">Europe: The Struggle for Supremacy from 1453 to the Present</a> by Brendan Simms<ul>
<li>Such an excellent introduction to the continent for Americans who otherwise don't have great background</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/19606799-wind-sand-and-stars">Wind, Sand and Stars</a> by Antoine de Saint-Exupéry<ul>
<li>A beautiful memoir of flights by the author of The Little Prince, very similar in style to Hemingway</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/11169043-american-colossus">American Colussus: The Triumph of Capitalism, 1865-1900</a> by H.W. Brands<ul>
<li>Baby's first primer on unions, (I need more recommendations on the history of unions)</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/2360599.Making_Common_Sense_of_Japan">Making Common Sense of Japan</a> by Steven R. Reed<ul>
<li>It can be difficult to find English translations of Korean, Japanese history by Korean and Japanese authors; this is a good one by an American professor</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/93904.The_Machine_That_Changed_the_World">The Machine that Changed the World</a> by James P. Womack<ul>
<li>An excellent, well-researched history of automobile manufacturing in the US, Europe and Japan from the 1900s to 1990; how Japan ate everyone's lunch</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/40620.The_United_States_of_Europe">The United States of Europe</a> by T.R. Reid<ul>
<li>Very light introduction to the European Union</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/7090.The_Soul_of_a_New_Machine">The Soul of a New Machine</a> by Tracy Kidder<ul>
<li>Overhyped by the internets, but not bad</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/8155672-the-german-genius">The German Genius</a> by Peter Watson<ul>
<li>Dense but excellent introduction to many famous Germans in many fields throughout time</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/18626537-ninety-percent-of-everything">Ninety Percent of Everything: Inside Shipping</a> by Rose George</li>
</ul>
<h3 id="fiction">Fiction</h3><p>I'm trying to read more from non-English authors. If you see
non-English authors in the vein of these here that you can recommend,
I'd love to hear from you.</p>
<p>The 12 fiction books I most recommend are:</p>
<ul>
<li><a href="https://www.goodreads.com/book/show/11607290-planet-of-the-apes">Planet of the Apes</a> (Must read, yes even if you've seen the film)</li>
<li><a href="https://www.goodreads.com/book/show/18882869-all-quiet-on-the-western-front">All Quiet on the Western Front</a> (Must read)</li>
<li><a href="https://www.goodreads.com/book/show/26167126-the-mouse-that-roared">The Mouse That Roared</a> (Must read)</li>
<li><a href="https://www.goodreads.com/book/show/25171354-the-dead-mountaineer-s-inn">The Dead Mountaineer's Inn</a></li>
<li><a href="https://www.goodreads.com/book/show/17406654-the-golem-and-the-jinni">The Golem and the Jinni</a></li>
<li><a href="https://www.goodreads.com/book/show/38886181-neverwhere">Neverwhere</a></li>
<li><a href="https://www.goodreads.com/book/show/35901747-dubliners">Dubliners</a></li>
<li><a href="https://www.goodreads.com/book/show/36510196-old-man-s-war">Old Man's War</a></li>
<li><a href="https://www.goodreads.com/book/show/38453346-the-inspector-barlach-mysteries">The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion</a></li>
<li><a href="https://www.goodreads.com/book/show/18842344-fant-mas">Fantômas</a></li>
<li><a href="https://www.goodreads.com/book/show/40793127-foundation">Foundation</a></li>
<li><a href="https://www.goodreads.com/book/show/13380806-out-of-the-silent-planet">Out of the Silent Planet</a></li>
</ul>
<p>The only book I really didn't like was "Invisible Cities".</p>
<h4 id="the-whole-list">The whole list</h4><ul>
<li><a href="https://www.goodreads.com/book/show/18782460-march-violets">March Violets</a> by Philip Kerr (Scottish)</li>
<li><a href="https://www.goodreads.com/book/show/25299696-liberty-bar">Liberty Bar</a> by Georges Simenon (Belgian)</li>
<li><a href="https://www.goodreads.com/book/show/20018218-the-late-monsieur-gallet">The Late Monsieur Gallet</a> by Georges Simenon (Belgian)</li>
<li><a href="https://www.goodreads.com/book/show/35901747-dubliners">Dubliners</a> by James Joyce (Irish)</li>
<li><a href="https://www.goodreads.com/book/show/11580940-tales-of-the-city">Tales of the City</a> by Amistead Maupin (American)</li>
<li><a href="https://www.goodreads.com/book/show/52971537-the-third-policeman">The Third Policeman</a> by Flann O'Brien (Irish)</li>
<li><a href="https://www.goodreads.com/book/show/6522120-44-scotland-street">44 Scotland Street</a> by Alexander McCall Smith (British-African)</li>
<li><a href="https://www.goodreads.com/book/show/23209197-knots-and-crosses">Knots and Crosses</a> by Ian Rankin (Scottish)</li>
<li><a href="https://www.goodreads.com/book/show/35598044-i-hear-your-voice">I Hear Your Voice</a> by Kim Young Ha (South Korean)</li>
<li><a href="https://www.goodreads.com/book/show/17406654-the-golem-and-the-jinni">The Golem and the Jinni</a> by Helene Wecker (American)</li>
<li><a href="https://www.goodreads.com/book/show/25541152-the-tokyo-zodiac-murders">The Tokyo Zodiac Murders</a> by Shimada Sōji (Japanese)</li>
<li><a href="https://www.goodreads.com/book/show/8130077-the-screwtape-letters">The Screwtape Letters</a> by C.S. Lewis (English)</li>
<li><a href="https://www.goodreads.com/book/show/38886181-neverwhere">Neverwhere</a> by Neil Gaiman (English)</li>
<li><a href="https://www.goodreads.com/book/show/36510196-old-man-s-war">Old Man's War</a> by John Scalzi (American)</li>
<li><a href="https://www.goodreads.com/book/show/9285319-tales-from-earthsea">Tales from Earthsea</a> by Ursula K. Le Guin (American)</li>
<li><a href="https://www.goodreads.com/book/show/23632478-solaris">Solaris</a> by Stanisław Lem (Polish)</li>
<li><a href="https://www.goodreads.com/book/show/16029682-a-wizard-of-earthsea">A Wizard of Earthsea</a> by Ursula K. Le Guin (American)</li>
<li><a href="https://www.goodreads.com/book/show/11607290-planet-of-the-apes">Planet of the Apes</a> by Pierre Boulle (French)</li>
<li><a href="https://www.goodreads.com/book/show/25171354-the-dead-mountaineer-s-inn">The Dead Mountaineer's Inn</a> by Arkady Strugatsky (Russian)</li>
<li><a href="https://www.goodreads.com/book/show/49605492-invisible-cities">Invisible Cities</a> by Italo Calvino (Cuban-born Italian)</li>
<li><a href="https://www.goodreads.com/book/show/38453346-the-inspector-barlach-mysteries">The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion</a> by Friedrich Dürrenmatt (Swiss)</li>
<li><a href="https://www.goodreads.com/book/show/18842344-fant-mas">Fantômas</a> by Marcel Allain (French)</li>
<li><a href="https://www.goodreads.com/book/show/18882869-all-quiet-on-the-western-front">All Quiet on the Western Front</a> by Erich Maria Remarque (Germany)</li>
<li><a href="https://www.goodreads.com/book/show/22346782-a-crime-in-holland">A Crime in Holland</a> by Georges Simenon (Belgian)</li>
<li><a href="https://www.goodreads.com/book/show/32076294-the-wonderful-adventure-of-nils-holgersson">The Wonderful Adventure of Nils Holversson</a> by Selma Lagerlöf (Swedish)</li>
<li><a href="https://www.goodreads.com/book/show/40793127-foundation">Foundation</a> by Isaac Asimov (Russian-born American)</li>
<li><a href="https://www.goodreads.com/book/show/13380806-out-of-the-silent-planet">Out of the Silent Planet</a> by C.S. Lewis (English)</li>
<li><a href="https://www.goodreads.com/book/show/19847968-the-spy-who-came-in-from-the-cold">The Spy Who Came in from the Cold</a> by John le Carré (English)</li>
<li><a href="https://www.goodreads.com/book/show/19792871-the-bat">The Bat</a> by Jo Nesbø (Norwegian)</li>
<li><a href="https://www.goodreads.com/book/show/26167126-the-mouse-that-roared">The Mouse That Roared</a> by Leonard Wibberley (Irish-born American)</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Out of 47 books read this year, here's the 20 I recommend to you (gave them 4/5 stars or better). I'm trying to read more non-English authors so I'd love to hear if there are authors with similar style on this list you'd recommend!<a href="https://t.co/FjHcvHpRSr">https://t.co/FjHcvHpRSr</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1343242325791805447?ref_src=twsrc%5Etfw">December 27, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/year-in-books-2020.htmlSun, 27 Dec 2020 00:00:00 +0000
- Static analysis with semgrep: practical examples using Dockerhttp://notes.eatonphil.com/static-analysis-with-semgrep.html<p>In this post we'll get a basic semgrep environment set up in Docker
running some custom rules against our code.</p>
<h3 id="existing-linters">Existing linters</h3><p>Linters like <a href="https://www.pylint.org/">pylint</a> for Python or
<a href="https://eslint.org/">eslint</a> for JavaScript are great for general,
broad language standards. But what about common nits in code review
like using print statements instead of a logger, or using a defer
statement inside a for loop (Go specific), or the existence of
multiple nested loops.</p>
<p>Most developers don't have experience working with language
parsing. So it's fairly uncommon in small- and medium-sized teams to
see custom linting rules. And while no single linter or language is
that much more complex than the other (it's all just AST operations),
there is a small penalty to learning the AST and framework for each
language linter.</p>
<h3 id="semgrep">Semgrep</h3><p><a href="https://semgrep.dev/">Semgrep</a> is a generic tool for finding patterns
in source code. Unlike traditional regex (and traditional grep) it can
find recursive patterns. This makes it especially useful as a tool to
learn for finding patterns in any language.</p>
<p>An advantage of semgrep rules is that you can learn the semgrep
pattern matching syntax (which is surprisingly easy) and then you can
write rules for any language you'd like to write rules for.</p>
<p>And while the <a href="https://semgrep.dev/editor">online rule tester</a> is
awesome, I had a hard time going from that to a working sample on my
own laptop with Docker. We'll do just that.</p>
<h3 id="catching-print-statements-in-python">Catching print statements in Python</h3><p>Let's say we want a script to fail on any use of print statements in
Python:</p>
<div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">cat</span> <span class="n">test</span><span class="o">/</span><span class="n">python</span><span class="o">/</span><span class="n">simple</span><span class="o">-</span><span class="nb">print</span><span class="o">.</span><span class="n">py</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"DEBUG: here"</span><span class="p">)</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"DEBUG: "</span><span class="p">,</span> <span class="s2">"now here"</span><span class="p">)</span>
</pre></div>
<p>The current <a href="https://semgrep.dev/editor">default example</a> shown in the
online editor happens to be for just this. Click the Advanced tab and
you'll see the following:</p>
<div class="highlight"><pre><span></span><span class="nt">rules</span><span class="p">:</span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span>
<span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">print("...")</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">Semgrep found a match</span>
<span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span>
</pre></div>
<p>Copy this into <code>config.yml</code>. Let's modify the pattern to
warn on all print calls, not just print calls with a single string
argument:</p>
<div class="highlight"><pre><span></span><span class="nt">rules</span><span class="p">:</span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span>
<span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">print(...)</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">Semgrep found a match</span>
<span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span>
</pre></div>
<p>The editor doesn't mention it (nor do any docs I can find) but we also
need to include two keys in the individual rule
object: <code>mode</code> and <code>languages</code>.</p>
<div class="highlight"><pre><span></span><span class="nt">rules</span><span class="p">:</span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span>
<span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">print(...)</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">Semgrep found a match</span>
<span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span>
<span class="w"> </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">search</span>
<span class="w"> </span><span class="nt">languages</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"generic"</span><span class="p p-Indicator">]</span>
</pre></div>
<p>Semgrep fails really weirdly if you set <code>mode</code> to
anything other than <code>search</code>, but it won't warn you that
what you set is garbage. The <code>languages</code> setting is
similarly fickle and doesn't give you much feedback if you set it
incorrectly.</p>
<p class="note">
Also, I'm using the "generic" language here because I don't
understand the difference between languages and as far as I'm
concerned the syntax I'm using here is already pretty generic.
</p><p>We run the semgrep Docker image:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">"</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src"</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>config.yml<span class="w"> </span>test/python
A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information.
running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules...
test/python/simple-print.py
severity:warning<span class="w"> </span>rule:fail-on-print:<span class="w"> </span>Semgrep<span class="w"> </span>found<span class="w"> </span>a<span class="w"> </span>match
<span class="m">2</span>:print<span class="o">(</span><span class="s2">"DEBUG: here"</span><span class="o">)</span>
ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">1</span><span class="w"> </span>files:<span class="w"> </span><span class="m">1</span><span class="w"> </span>findings<span class="s2">""</span><span class="o">)</span>
</pre></div>
<p>And there we've got our warning!</p>
<p class="note">
Not completely clear to me why we're getting warned about a new
version when we've pulled <code>latest</code> as the linked docs
suggest. Maybe there's a newer version that hasn't made it into a
Docker image yet.
</p><h3 id="catching-fmt.print*-statements-in-go">Catching fmt.Print* statements in Go</h3><p>Let's say we also want to fail on print statements in Go (because we
should use a logger instead):</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">golang</span><span class="o">/</span><span class="nx">simple</span><span class="o">-</span><span class="nx">print</span><span class="p">.</span><span class="k">go</span>
<span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"fmt"</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"here"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">a</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"%s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">)</span>
<span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"My crazy error"</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>We could try to look for any <code>import "fmt"</code> code in a file
but that would fail on uses of <code>fmt.Sprintf</code>
or <code>fmt.Errorf</code> which are fine. Instead we'll just focus on
uses of <code>fmt.Printf</code> or <code>fmt.Println</code>:</p>
<div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">$ cat go-config.yml</span>
<span class="l l-Scalar l-Scalar-Plain">rules</span><span class="p p-Indicator">:</span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span>
<span class="w"> </span><span class="nt">pattern-either</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fmt.Printf(...)</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fmt.Println(...)</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">Semgrep found a match</span>
<span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span>
<span class="w"> </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">search</span>
<span class="w"> </span><span class="nt">languages</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"generic"</span><span class="p p-Indicator">]</span>
</pre></div>
<p>Run the Go config against the Go files:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">"</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src"</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>go-config.yml<span class="w"> </span>test/golang
A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information.
running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules...
test/golang/simple-print.go
severity:warning<span class="w"> </span>rule:fail-on-print:<span class="w"> </span>Semgrep<span class="w"> </span>found<span class="w"> </span>a<span class="w"> </span>match
<span class="m">8</span>:fmt.Printf<span class="o">(</span><span class="s2">"%s\n"</span>,<span class="w"> </span>a<span class="o">)</span>
--------------------------------------------------------------------------------
<span class="m">7</span>:fmt.Println<span class="o">(</span>a<span class="o">)</span>
ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">1</span><span class="w"> </span>files:<span class="w"> </span><span class="m">2</span><span class="w"> </span>findings
</pre></div>
<p>Cool! Making some sense. Now let's try a harder pattern.</p>
<h3 id="catching-triple-nested-for-loops">Catching triple-nested for loops</h3><p>Let's try to warn on the triple-nested loop in this code:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">golang</span><span class="o">/</span><span class="nx">loopy</span><span class="p">.</span><span class="k">go</span>
<span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"log"</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">j</span>
<span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">k</span><span class="o">++</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>If we want to catch the use of nested for loops here then we'll need
to search for the loops surrounded by arbitrary
syntax. Semgrep's <code>...</code> syntax makes this easy.</p>
<div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">$ cat go-config2.yml</span>
<span class="l l-Scalar l-Scalar-Plain">rules</span><span class="p p-Indicator">:</span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-3-loop</span>
<span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">for ... {</span>
<span class="w"> </span><span class="no">...</span>
<span class="w"> </span><span class="no">for ... {</span>
<span class="w"> </span><span class="no">...</span>
<span class="w"> </span><span class="no">for ... {</span>
<span class="w"> </span><span class="no">...</span>
<span class="w"> </span><span class="no">}</span>
<span class="w"> </span><span class="no">...</span>
<span class="w"> </span><span class="no">}</span>
<span class="w"> </span><span class="no">...</span>
<span class="w"> </span><span class="no">}</span>
<span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">Semgrep found a match</span>
<span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span>
<span class="w"> </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">search</span>
<span class="w"> </span><span class="nt">languages</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"generic"</span><span class="p p-Indicator">]</span>
</pre></div>
<p>And run semgrep:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">"</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src"</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>go-config2.yml<span class="w"> </span>test/golang
A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information.
running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules...
test/golang/loopy.go
severity:warning<span class="w"> </span>rule:fail-on-3-loop:<span class="w"> </span>Semgrep<span class="w"> </span>found<span class="w"> </span>a<span class="w"> </span>match
<span class="m">7</span>:for<span class="w"> </span>i<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">;</span><span class="w"> </span>i<span class="w"> </span><<span class="w"> </span><span class="m">10</span><span class="p">;</span><span class="w"> </span>i++<span class="w"> </span><span class="o">{</span>
<span class="m">8</span>:<span class="w"> </span>log.Print<span class="o">(</span>i<span class="o">)</span>
<span class="m">9</span>:
<span class="m">10</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>j<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">;</span><span class="w"> </span>j<span class="w"> </span><<span class="w"> </span><span class="m">100</span><span class="p">;</span><span class="w"> </span>j++<span class="w"> </span><span class="o">{</span>
<span class="m">11</span>:<span class="w"> </span>c<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>i<span class="w"> </span>*<span class="w"> </span>j
<span class="m">12</span>:
<span class="m">13</span>:<span class="w"> </span>going<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="nb">true</span>
<span class="m">14</span>:<span class="w"> </span>k<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span>
<span class="m">15</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>going<span class="w"> </span><span class="o">{</span>
<span class="m">16</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">k</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>c<span class="w"> </span><span class="o">{</span>
--------<span class="w"> </span><span class="o">[</span>hid<span class="w"> </span><span class="m">10</span><span class="w"> </span>additional<span class="w"> </span>lines,<span class="w"> </span>adjust<span class="w"> </span>with<span class="w"> </span>--max-lines-per-finding<span class="o">]</span><span class="w"> </span>--------
ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">2</span><span class="w"> </span>files:<span class="w"> </span><span class="m">1</span><span class="w"> </span>findings
</pre></div>
<p>That's just swell.</p>
<h3 id="limits-of-static-analysis">Limits of static analysis</h3><p>Now let's say we refactor one of the inner loops into its own
function.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">golang</span><span class="o">/</span><span class="nx">loopy</span><span class="p">.</span><span class="k">go</span>
<span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"log"</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">inner</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">j</span>
<span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">k</span><span class="o">++</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">inner</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And run semgrep again:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">"</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src"</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>go-config2.yml<span class="w"> </span>test/golang
<span class="w"> </span>A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information.
<span class="w"> </span>running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules...
<span class="w"> </span>ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">2</span><span class="w"> </span>files:<span class="w"> </span><span class="m">0</span><span class="w"> </span>findings
</pre></div>
<p>Well great. The 3-nested loop still exists but we can't find it
anymore because it's not syntactically obvious anymore.</p>
<p>At this point we'd need to start getting into linting based on runtime
analysis. If you know of a tool that does this and lets you write
rules like semgrep for it, please tell me!</p>
<h3 id="in-summary">In summary</h3><p>In the end though, it's still very useful to be able to learn a single
language for writing syntax rules at a high level to enforce behavior
in code. Furthermore, a generic syntax matcher helps you write easily
write rules for things that don't already have linters like YAML
or JSON configuration or Vagrantfiles.</p>
<p>It can be annoying to work around some missing docs in semgrep but
overall it's a great tool for the kit.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/semgrep?src=hash&ref_src=twsrc%5Etfw">#semgrep</a> is a really neat tool for syntactic analysis. Here are a few simple examples (catch print statements, triple nested loops, etc.) using Docker. Includes some necessary info the docs don't get into<a href="https://t.co/UDHEH5JmOa">https://t.co/UDHEH5JmOa</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1340785372364738562?ref_src=twsrc%5Etfw">December 20, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/static-analysis-with-semgrep.htmlSun, 20 Dec 2020 00:00:00 +0000
- Emulating linux/AMD64 userland: interpreting an ELF binaryhttp://notes.eatonphil.com/emulating-amd64-starting-with-elf.html<p>In this post we'll stumble toward a working emulator for a barebones C
program compiled for linux/AMD64. The approach will be slightly more
so based on observation than by following a spec; a great way
to quickly become familiar with a topic, and a bad way to guarantee
correctness.</p>
<p>The goal:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/simple.c
int<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">4</span><span class="p">;</span>
<span class="o">}</span>
$<span class="w"> </span>gcc<span class="w"> </span>tests/simple.c
$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main
$<span class="w"> </span>./main<span class="w"> </span>a.out<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">4</span>
</pre></div>
<p>This may look ridiculously simple but when you don't know how to deal
with a binary or how instructions are encoded, it will take a few
hours to write an emulator that can generally handle this program!</p>
<p>Code for this project is <a href="https://github.com/eatonphil/go-amd64-emulator">available on Github</a>.</p>
<h3 id="background">Background</h3><p>AMD64, x86_64 or x64 are different names for AMD's widely adopted
64-bit extension to Intel's x86 instruction set (i.e. the encoding and
semantics of x86 binaries). AMD64 is a superset of x86 (introducing
64-bit registers and operations) and thus backwards compatible with
x86 programs.</p>
<p class="note">
A year and a half ago I first got into emulation with
an <a href="https://notes.eatonphil.com/emulator-basics-a-stack-and-register-machine.html">AMD64
emulator in JavaScript</a>. The JavaScript emulator interpreted the
textual representation of AMD64 programs (e.g. <code>MOV RBP,
RSP</code>, Intel's assembly syntax). A C program had to be compiled
with <code>-S</code> to produce an assembly file that the JavaScript
emulator could read (i.e. <code>gcc -S tests/simple.c</code>) This
was a great way to get started with emulation by ignoring the
complexity of encoded instructions and executable formats.
</p><p>If we dig into the binary file produced by gcc on Linux we learn that
it is an <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF
file</a>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>gcc<span class="w"> </span>test/simple.c
$<span class="w"> </span>file<span class="w"> </span>a.out
a.out:<span class="w"> </span>ELF<span class="w"> </span><span class="m">64</span>-bit<span class="w"> </span>LSB<span class="w"> </span>executable,<span class="w"> </span>x86-64,<span class="w"> </span>version<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">(</span>SYSV<span class="o">)</span>,<span class="w"> </span>dynamically<span class="w"> </span>linked,<span class="w"> </span>interpreter<span class="w"> </span>/lib64/ld-linux-x86-64.so.2,<span class="w"> </span>BuildID<span class="o">[</span>sha1<span class="o">]=</span>d0b5c742b9fbcbcca4dfa9438a8437a8478a51bb,<span class="w"> </span><span class="k">for</span><span class="w"> </span>GNU/Linux<span class="w"> </span><span class="m">3</span>.2.0,<span class="w"> </span>not<span class="w"> </span>stripped
</pre></div>
<p>ELF is responsible for surrounding the actual binary-encoded program
instructions with metadata on exported and imported C identifiers and
program entrypoint. But for simple programs like this initial
emulator, we can ignore export/imports. We'll only use the ELF
metadata to find out where the instructions for our <code>main</code>
function start.</p>
<h3 id="where-is-main?">Where is main?</h3><p>If we use an ELF reader+disassembler on the binary generated by gcc
and search for <code>main</code> we can find its address.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">'<main>'</span>
<span class="m">0000000000401106</span><span class="w"> </span><main>:
<span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp
<span class="w"> </span><span class="m">401107</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>mov<span class="w"> </span>%rsp,%rbp
<span class="w"> </span>40110a:<span class="w"> </span>b8<span class="w"> </span>fe<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>xfe,%eax
<span class="w"> </span>40110f:<span class="w"> </span>5d<span class="w"> </span>pop<span class="w"> </span>%rbp
<span class="w"> </span><span class="m">401110</span>:<span class="w"> </span>c3<span class="w"> </span>retq
<span class="w"> </span><span class="m">401111</span>:<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopw<span class="w"> </span>%cs:0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span>
<span class="w"> </span><span class="m">401118</span>:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span>
<span class="w"> </span>40111b:<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">44</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopl<span class="w"> </span>0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span>
<span class="m">0000000000401120</span><span class="w"> </span><__libc_csu_init>:
</pre></div>
<p>This means that the function, <code>main</code>, starts at
address <code>0x401106</code> in memory. Furthermore, this implies
that the binary must be loaded into CPU memory such that the CPU can
jump here to execute our program.</p>
<p>In truth, <code>main</code> is not this program's entrypoint. If we
run <code>objdump -x a.out</code> we can see that the ELF entrypoint
is <code>0x401020</code>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-x<span class="w"> </span>a.out
a.out:<span class="w"> </span>file<span class="w"> </span>format<span class="w"> </span>elf64-x86-64
a.out
architecture:<span class="w"> </span>i386:x86-64,<span class="w"> </span>flags<span class="w"> </span>0x00000112:
EXEC_P,<span class="w"> </span>HAS_SYMS,<span class="w"> </span>D_PAGED
start<span class="w"> </span>address<span class="w"> </span>0x0000000000401020
Program<span class="w"> </span>Header:
<span class="w"> </span>PHDR<span class="w"> </span>off<span class="w"> </span>0x0000000000000040<span class="w"> </span>vaddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>paddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>align<span class="w"> </span><span class="m">2</span>**3
<span class="w"> </span>filesz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>memsz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>flags<span class="w"> </span>r--
</pre></div>
<p>This is because the actual entrypoint gcc sets up is a function called
<code>_start</code>. The libc prelude beginning
with <code>_start</code> is responsible for initializing the libc
runtime, calling our <code>main</code> function and executing the exit
syscall with the return value of <code>main</code>.</p>
<div class="highlight"><pre><span></span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">'<_start>'</span>
<span class="m">0000000000401020</span><span class="w"> </span><_start>:
<span class="w"> </span><span class="m">401020</span>:<span class="w"> </span>f3<span class="w"> </span>0f<span class="w"> </span>1e<span class="w"> </span>fa<span class="w"> </span>endbr64
<span class="w"> </span><span class="m">401024</span>:<span class="w"> </span><span class="m">31</span><span class="w"> </span>ed<span class="w"> </span>xor<span class="w"> </span>%ebp,%ebp
<span class="w"> </span><span class="m">401026</span>:<span class="w"> </span><span class="m">49</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>d1<span class="w"> </span>mov<span class="w"> </span>%rdx,%r9
<span class="w"> </span><span class="m">401029</span>:<span class="w"> </span>5e<span class="w"> </span>pop<span class="w"> </span>%rsi
<span class="w"> </span>40102a:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e2<span class="w"> </span>mov<span class="w"> </span>%rsp,%rdx
<span class="w"> </span>40102d:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">83</span><span class="w"> </span>e4<span class="w"> </span>f0<span class="w"> </span>and<span class="w"> </span><span class="nv">$0</span>xfffffffffffffff0,%rsp
<span class="w"> </span><span class="m">401031</span>:<span class="w"> </span><span class="m">50</span><span class="w"> </span>push<span class="w"> </span>%rax
<span class="w"> </span><span class="m">401032</span>:<span class="w"> </span><span class="m">54</span><span class="w"> </span>push<span class="w"> </span>%rsp
<span class="w"> </span><span class="m">401033</span>:<span class="w"> </span><span class="m">49</span><span class="w"> </span>c7<span class="w"> </span>c0<span class="w"> </span><span class="m">90</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">40</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>x401190,%r8
<span class="w"> </span>40103a:<span class="w"> </span><span class="m">48</span><span class="w"> </span>c7<span class="w"> </span>c1<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">40</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>x401120,%rcx
</pre></div>
<p>But because all this libc initialization is relatively complicated
we're just going to skip the actual ELF entrypoint for now. Our
emulator will locate <code>main</code>, load the binary into memory,
jump to the start of <code>main</code>, and set the exit code of the
emulator to the result of main.</p>
<p class="note">
As you can see, this ELF binary has its own hard-coded view of where
it will be in memory. What if our CPU were to run multiple process
at once? We might give each process its own virtual memory space
and map back to a real memory space so each process (and by
extension, compilers) doesn't have to think about how they fit into
memory relative to other processes.
</p><p>The last question to figure out is where to load the ELF binary into
emulator memory so that addresses in memory are where the program
expects.</p>
<p>As it turns out, there is a piece of metadata called section
headers that contain an address and a offset from the start of the ELF
file. By subtracting this we can get the location the file expects to
be in memory.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-x<span class="w"> </span>a.out
a.out:<span class="w"> </span>file<span class="w"> </span>format<span class="w"> </span>elf64-x86-64
a.out
architecture:<span class="w"> </span>i386:x86-64,<span class="w"> </span>flags<span class="w"> </span>0x00000112:
EXEC_P,<span class="w"> </span>HAS_SYMS,<span class="w"> </span>D_PAGED
start<span class="w"> </span>address<span class="w"> </span>0x0000000000401020
Program<span class="w"> </span>Header:
<span class="w"> </span>PHDR<span class="w"> </span>off<span class="w"> </span>0x0000000000000040<span class="w"> </span>vaddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>paddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>align<span class="w"> </span><span class="m">2</span>**3
<span class="w"> </span>filesz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>memsz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>flags<span class="w"> </span>r--
</pre></div>
<p>That is: <code>0x400040 (vaddr) - 0x40 (off) = 0x400000</code>.
Judging from a Google search this seems to be a pretty common address
where ELF binaries are loaded into memory.</p>
<h3 id="elf-and-go">ELF and Go</h3><p>Binary file formats tend to be a pain to work with because, to enable
greater compression, everything ends up being a pointer to something
else. So you end up jumping all around the file just to stitch
information back together.</p>
<p>So the one third-party-ish library we'll use is Go's builtin
<code>debug/elf</code> package. With this library we can load an ELF
binary and iterate over symbols and sections to discover the location
of <code>main</code> and the start address for the binary in memory.</p>
<p>Editing in <code>main.go</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="s">"debug/elf"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"io/ioutil"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">process</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">bin</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">readELF</span><span class="p">(</span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">entrySymbol</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bin</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">filename</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">elffile</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">NewFile</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">bin</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">symbols</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">elffile</span><span class="p">.</span><span class="nx">Symbols</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">sym</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">sym</span><span class="p">.</span><span class="nx">Name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entrySymbol</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">STT_FUNC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">ST_TYPE</span><span class="p">(</span><span class="nx">sym</span><span class="p">.</span><span class="nx">Info</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">STB_GLOBAL</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">ST_BIND</span><span class="p">(</span><span class="nx">sym</span><span class="p">.</span><span class="nx">Info</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">sym</span><span class="p">.</span><span class="nx">Value</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not find entrypoint symbol: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">entrySymbol</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">sec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">elffile</span><span class="p">.</span><span class="nx">Sections</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">sec</span><span class="p">.</span><span class="nx">Type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">SHT_NULL</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">sec</span><span class="p">.</span><span class="nx">Addr</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">sec</span><span class="p">.</span><span class="nx">Offset</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Could not determine start address"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">process</span><span class="p">{</span>
<span class="w"> </span><span class="nx">startAddress</span><span class="p">:</span><span class="w"> </span><span class="nx">startAddress</span><span class="p">,</span>
<span class="w"> </span><span class="nx">entryPoint</span><span class="p">:</span><span class="w"> </span><span class="nx">entryPoint</span><span class="p">,</span>
<span class="w"> </span><span class="nx">bin</span><span class="p">:</span><span class="w"> </span><span class="nx">bin</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">)</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Binary not provided"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">proc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readELF</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">"main"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Start: 0x%x\nEntry: 0x%x\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">startAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">entryPoint</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>We can test on a basic compiled C program:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/simple.c
int<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">4</span><span class="p">;</span>
<span class="o">}</span>
$<span class="w"> </span>gcc<span class="w"> </span>tests/simple.c
$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main
$<span class="w"> </span>./main<span class="w"> </span>a.out
Start:<span class="w"> </span>0x400000
Entry:<span class="w"> </span>0x401106
</pre></div>
<p>And verify against <code>objdump</code>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">'<main>'</span>
<span class="m">0000000000401106</span><span class="w"> </span><main>:
<span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp<span class="s1">'>'</span>
</pre></div>
<p>And that's it for dealing with ELF. Now we can sketch out a virtual
CPU and how we deal with interpreting instructions starting at this
address.</p>
<h3 id="the-cpu">The CPU</h3><p>AMD64 counts on being able to store values in registers and memory,
sometimes through direct addressing and sometimes indirectly using
stack operations (push and pop). And userland processes count on being
loaded into CPU memory so the CPU can jump to the process entrypoint
and process.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">cpu</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">proc</span><span class="w"> </span><span class="o">*</span><span class="nx">process</span>
<span class="w"> </span><span class="nx">mem</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="nx">regfile</span><span class="w"> </span><span class="o">*</span><span class="nx">registerFile</span>
<span class="w"> </span><span class="nx">tick</span><span class="w"> </span><span class="kd">chan</span><span class="w"> </span><span class="kt">bool</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">newCPU</span><span class="p">(</span><span class="nx">memory</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="nx">cpu</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cpu</span><span class="p">{</span>
<span class="w"> </span><span class="nx">mem</span><span class="p">:</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">memory</span><span class="p">),</span>
<span class="w"> </span><span class="nx">regfile</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">registerFile</span><span class="p">{},</span>
<span class="w"> </span><span class="nx">tick</span><span class="p">:</span><span class="w"> </span><span class="nb">make</span><span class="p">(</span><span class="kd">chan</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>The <code>tick</code> channel is so that later on we can wrap the
emulator in a terminal debugger. But by default we'll just set up a
goroutine to tick forever.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">repl</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">)</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">"Binary not provided"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">proc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readELF</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">"main"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">debug</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"--debug"</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"-d"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">debug</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// 10 MB</span>
<span class="w"> </span><span class="nx">cpu</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newCPU</span><span class="p">(</span><span class="mh">0x400000</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span>
<span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">cpu</span><span class="p">.</span><span class="nx">run</span><span class="p">(</span><span class="nx">proc</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">debug</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">repl</span><span class="p">(</span><span class="o">&</span><span class="nx">cpu</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cpu</span><span class="p">.</span><span class="nx">tick</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="registers">Registers</h3><p>To emulate a simple program like our <code>tests/simple.c</code>,
we'll only need to support a few common registers. The order is
important so that we can use the Go identifiers when we want to refer
to the <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers">encoded integer value of the
register</a>.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="kt">int</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="c1">// These are in order of encoding value (i.e. rbp is 5)</span>
<span class="w"> </span><span class="nx">rax</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">rcx</span>
<span class="w"> </span><span class="nx">rdx</span>
<span class="w"> </span><span class="nx">rbx</span>
<span class="w"> </span><span class="nx">rsp</span>
<span class="w"> </span><span class="nx">rbp</span>
<span class="w"> </span><span class="nx">rsi</span>
<span class="w"> </span><span class="nx">rdi</span>
<span class="w"> </span><span class="nx">r8</span>
<span class="w"> </span><span class="nx">r9</span>
<span class="w"> </span><span class="nx">r10</span>
<span class="w"> </span><span class="nx">r11</span>
<span class="w"> </span><span class="nx">r12</span>
<span class="w"> </span><span class="nx">r13</span>
<span class="w"> </span><span class="nx">r14</span>
<span class="w"> </span><span class="nx">r15</span>
<span class="w"> </span><span class="nx">rip</span>
<span class="w"> </span><span class="nx">rflags</span>
<span class="p">)</span>
<span class="kd">var</span><span class="w"> </span><span class="nx">registerMap</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="nx">register</span><span class="p">]</span><span class="kt">string</span><span class="p">{</span>
<span class="w"> </span><span class="nx">rax</span><span class="p">:</span><span class="w"> </span><span class="s">"rax"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rcx</span><span class="p">:</span><span class="w"> </span><span class="s">"rcx"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rdx</span><span class="p">:</span><span class="w"> </span><span class="s">"rdx"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rbx</span><span class="p">:</span><span class="w"> </span><span class="s">"rbx"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rsp</span><span class="p">:</span><span class="w"> </span><span class="s">"rsp"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rbp</span><span class="p">:</span><span class="w"> </span><span class="s">"rbp"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rsi</span><span class="p">:</span><span class="w"> </span><span class="s">"rsi"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rdi</span><span class="p">:</span><span class="w"> </span><span class="s">"rdi"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r8</span><span class="p">:</span><span class="w"> </span><span class="s">"r8"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r9</span><span class="p">:</span><span class="w"> </span><span class="s">"r9"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r10</span><span class="p">:</span><span class="w"> </span><span class="s">"r10"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r11</span><span class="p">:</span><span class="w"> </span><span class="s">"r11"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r12</span><span class="p">:</span><span class="w"> </span><span class="s">"r12"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r13</span><span class="p">:</span><span class="w"> </span><span class="s">"r13"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r14</span><span class="p">:</span><span class="w"> </span><span class="s">"r14"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">r15</span><span class="p">:</span><span class="w"> </span><span class="s">"r15"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rip</span><span class="p">:</span><span class="w"> </span><span class="s">"rip"</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rflags</span><span class="p">:</span><span class="w"> </span><span class="s">"rflags"</span><span class="p">,</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">registerFile</span><span class="w"> </span><span class="p">[</span><span class="mi">18</span><span class="p">]</span><span class="kt">uint64</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">regfile</span><span class="w"> </span><span class="o">*</span><span class="nx">registerFile</span><span class="p">)</span><span class="w"> </span><span class="nx">get</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="nx">register</span><span class="p">)</span><span class="w"> </span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">regfile</span><span class="p">[</span><span class="nx">r</span><span class="p">]</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">regfile</span><span class="w"> </span><span class="o">*</span><span class="nx">registerFile</span><span class="p">)</span><span class="w"> </span><span class="nx">set</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="nx">register</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">regfile</span><span class="p">[</span><span class="nx">r</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">v</span>
<span class="p">}</span>
</pre></div>
<p>Of immediate importance will be <code>rip</code>, <code>rsp</code>,
and <code>rax</code> registers. <code>rip</code> is used to track the
current instruction to process. It will generally be incremented
except for when dealing with function calls and
returns. <code>rsp</code> is used as a pointer to the top of a stack
in memory. It is incremented and decremented as values are pushed and
popped on this stack. Finally, <code>rax</code> is used to pass
function return values.</p>
<h3 id="loading-a-program">Loading a program</h3><p>Running a program is a matter of loading the program into memory,
setting the stack pointer to the last address of memory (in x86 the
stack grows down), pointing <code>rip</code> at the entrypoint, and
looping until the entrypoint function returns.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">to</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">bytes</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">to</span><span class="p">[</span><span class="nx">start</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nx">i</span><span class="p">)]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="nx">val</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xFF</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o"><-</span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span>
<span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rip</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// TODO: deal with instructions</span>
<span class="w"> </span><span class="c1">// move to next instruction</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">ip</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">run</span><span class="p">(</span><span class="nx">proc</span><span class="w"> </span><span class="o">*</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">copy</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">proc</span><span class="p">.</span><span class="nx">startAddress</span><span class="p">:</span><span class="nx">proc</span><span class="p">.</span><span class="nx">startAddress</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">proc</span><span class="p">.</span><span class="nx">bin</span><span class="p">))],</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">bin</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">entryPoint</span><span class="p">)</span>
<span class="w"> </span><span class="nx">initialStackPointer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">)</span><span class="o">-</span><span class="mi">8</span><span class="p">)</span>
<span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">initialStackPointer</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="nx">initialStackPointer</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nx">initialStackPointer</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loop</span><span class="p">(</span><span class="nx">initialStackPointer</span><span class="p">)</span>
<span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Exit</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rax</span><span class="p">)))</span>
<span class="p">}</span>
</pre></div>
<p>We write the initial stack pointer address into the stack so that when
the program final returns, it will return to this address at which
pointer we can exit the program.</p>
<p>And now we're ready to start interpreting instructions.</p>
<h3 id="instruction-decoding">Instruction decoding</h3><p>Using <code>objdump</code> we get a sense for what the program decodes
to.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">'<main>'</span>
<span class="m">0000000000401106</span><span class="w"> </span><main>:
<span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp
<span class="w"> </span><span class="m">401107</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>mov<span class="w"> </span>%rsp,%rbp
<span class="w"> </span>40110a:<span class="w"> </span>b8<span class="w"> </span>fe<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>xfe,%eax
<span class="w"> </span>40110f:<span class="w"> </span>5d<span class="w"> </span>pop<span class="w"> </span>%rbp
<span class="w"> </span><span class="m">401110</span>:<span class="w"> </span>c3<span class="w"> </span>retq
<span class="w"> </span><span class="m">401111</span>:<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopw<span class="w"> </span>%cs:0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span>
<span class="w"> </span><span class="m">401118</span>:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span>
<span class="w"> </span>40111b:<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">44</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopl<span class="w"> </span>0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span>
<span class="m">0000000000401120</span><span class="w"> </span><__libc_csu_init>:
</pre></div>
<p>We see that <code>0x55</code> means <code>push
%rbp</code>. And we also see that instructions aren't a fixed number
of bytes. Some are one byte, some are seven. Some (not shown) are <a href="https://stackoverflow.com/questions/14698350/x86-64-asm-maximum-bytes-for-an-instruction">even
longer</a>.</p>
<p>Thankfully instructions follow some fairly simple patterns. There are
a set of prefix instructions and a set of real instructions. So far we
should be able to tell on the first byte whether the instruction is a
prefix instruction and, if not, how many bytes the instruction will
take up on the whole.</p>
<h4 id="push">push</h4><p>To support a new instruction, we'll look up <code>0x55</code> in an
opcode table like <a href="http://ref.x86asm.net/coder64.html">this</a>. Clicking
on <a href="http://ref.x86asm.net/coder64.html#x50">55</a> in the opcode index we
see that this is indeed a push instruction. <code>50+r</code> means
that we have to subtract <code>0x50</code> from the opcode to
determine the register we should push.</p>
<p>The register will be <code>0x55 - 0x50 = 5</code> which if we look up
in a <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers">register
table</a>
is <code>rbp</code>. Since we set up our register enum in code in this
order, we'll be able to just use the constant <code>rbp</code> in Go
code.</p>
<p>Finally, since the next instruction numerically is <code>0x58</code>
we know that this instruction is identified by being between
<code>0x50</code> and <code>0x57</code> inclusive. This is all the
info we need to handle this instruction.</p>
<div class="highlight"><pre><span></span><span class="c1">// helper for dumping byte arrays as hex</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">str</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"%s:"</span>
<span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kd">interface</span><span class="p">{}{</span><span class="nx">msg</span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">bs</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">str</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">str</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">" %x"</span>
<span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="nx">str</span><span class="o">+</span><span class="s">"\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="o">...</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o"><-</span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span>
<span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rip</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// push</span>
<span class="w"> </span><span class="nx">regvalue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0x50</span><span class="p">))</span>
<span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="nx">regvalue</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="s">"prog"</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">:</span><span class="nx">ip</span><span class="o">+</span><span class="mi">10</span><span class="p">])</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Unknown instruction"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">ip</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>If we try this out now we should expect it to panic on the second
byte, <code>0x48</code>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main
$<span class="w"> </span>./main<span class="w"> </span>a.out
prog:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>b8<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>5d<span class="w"> </span>c3
panic:<span class="w"> </span>Unknown<span class="w"> </span>instruction
goroutine<span class="w"> </span><span class="m">19</span><span class="w"> </span><span class="o">[</span>running<span class="o">]</span>:
main.<span class="o">(</span>*cpu<span class="o">)</span>.loop<span class="o">(</span>0xc000086c30,<span class="w"> </span>0x2800000<span class="o">)</span>
<span class="w"> </span>/home/phil/tmp/goamd/main.go:168<span class="w"> </span>+0x16d
main.<span class="o">(</span>*cpu<span class="o">)</span>.run<span class="o">(</span>0xc000086c30,<span class="w"> </span>0xc000086c00<span class="o">)</span>
<span class="w"> </span>/home/phil/tmp/goamd/main.go:180<span class="w"> </span>+0xac
created<span class="w"> </span>by<span class="w"> </span>main.main
<span class="w"> </span>/home/phil/tmp/goamd/main.go:211<span class="w"> </span>+0x286
</pre></div>
<p>Looking good.</p>
<h4 id="mov">mov</h4><p>Taking a look at the next two instructions with <code>objdump</code>
we see <code>mov</code> encoded two different ways.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A4<span class="w"> </span><span class="s1">'<main>'</span>
<span class="m">0000000000401106</span><span class="w"> </span><main>:
<span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp
<span class="w"> </span><span class="m">401107</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>mov<span class="w"> </span>%rsp,%rbp
<span class="w"> </span>40110a:<span class="w"> </span>b8<span class="w"> </span>fe<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>xfe,%eax
</pre></div>
<p>Looking up <a href="http://ref.x86asm.net/coder64.html#x48">0x48</a> we see that
this is a prefix instruction that turns on 64-bit mode for the
instruction. Some instructions like <code>pop</code> and
<code>push</code> don't need this prefix to be in 64-bit mode. In any
case, this just means we'll have to have a size flag that switches
from 32-bit to 64-bit mode on seeing this instruction. This flag will
be reset each time we start reading an instruction.</p>
<p>To deal with prefixes in general we'll loop through bytes when
processing an instruction until we no longer see a prefix bytes. As we
see prefix bytes we'll handle them accordingly.</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">prefixBytes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">{</span><span class="mh">0x48</span><span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o"><-</span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span>
<span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rip</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span>
<span class="w"> </span><span class="nx">widthPrefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">32</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">isPrefixByte</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">prefixByte</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">prefixBytes</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prefixByte</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">isPrefixByte</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isPrefixByte</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// 64 bit prefix signifier</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0x48</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">widthPrefix</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">64</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="s">"prog"</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">:</span><span class="nx">ip</span><span class="o">+</span><span class="mi">10</span><span class="p">])</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Unknown prefix instruction"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">ip</span><span class="o">++</span>
<span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// push</span>
<span class="o">...</span>
</pre></div>
<p>Moving past this prefix we get to
<a href="http://ref.x86asm.net/coder64.html#x89">0x89</a>. This instruction is
for copying one register into another. The register operands are
<a href="http://www.c-jump.com/CIS77/CPU/x86/X77_0270_modrm_byte.htm">encoded in the second
byte</a>,
<code>0xe5</code>, called the ModR/M byte. Pulling out the two
registers is just a matter of shifting and bitmasking the right 3 bits
for each.</p>
<p>With this knowledge we can expand the instruction handling code.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// push</span>
<span class="w"> </span><span class="nx">regvalue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0x50</span><span class="p">))</span>
<span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="nx">regvalue</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0x89</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// mov r/m16/32/64, r/m16/32/64</span>
<span class="w"> </span><span class="nx">ip</span><span class="o">++</span>
<span class="w"> </span><span class="nx">inb2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span>
<span class="w"> </span><span class="nx">rhs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">((</span><span class="nx">inb2</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mi">0</span><span class="nx">b00111000</span><span class="p">)</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">3</span><span class="p">)</span>
<span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">inb2</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mi">0</span><span class="nx">b111</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">lhs</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rhs</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="s">"prog"</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">:</span><span class="nx">ip</span><span class="o">+</span><span class="mi">10</span><span class="p">])</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Unknown instruction"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Try emulating <code>a.out</code> again now. It will panic on the next
unknown instruction, <code>0xb8</code>. From <code>objdump</code>
disassembly we see this is another <code>mov</code> instruction.</p>
<p>Hurray! There are apparently multiple ways the same instruction can be
encoded. Looking it up in the opcode table, we see
<a href="http://ref.x86asm.net/coder64.html#xB8">0xB8</a> is for when the value
to be copied is a literal number. The operand will be 32-bits, or four
bytes, presumably because it doesn't have the <code>0x48</code>
prefix.</p>
<div class="highlight"><pre><span></span><span class="c1">// helper for converting up to 8 bytes into a single integer</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">from</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">bytes</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">from</span><span class="p">[</span><span class="nx">start</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nx">i</span><span class="p">)])</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">val</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mh">0xB8</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mh">0xC0</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// mov r16/32/64, imm16/32/64</span>
<span class="w"> </span><span class="nx">lreg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0xB8</span><span class="p">)</span>
<span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">ip</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="nx">widthPrefix</span><span class="o">/</span><span class="mi">8</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">widthPrefix</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">lreg</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">...</span>
</pre></div>
<p>Two more instructions to go: <code>pop</code> and <code>ret</code>.</p>
<h3 id="a-terminal-debugger">A terminal debugger</h3><p>Taking a break for a moment, our system is already too complex to
understand. It would be helpful to have a REPL so we can step through
instructions and print register and memory values.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">resolveDebuggerValue</span><span class="p">(</span><span class="nx">dval</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">reg</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">registerMap</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">dval</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">reg</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">dval</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="p">(</span><span class="nx">dval</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"0x"</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">dval</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"0X"</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseUint</span><span class="p">(</span><span class="nx">dval</span><span class="p">[</span><span class="mi">2</span><span class="p">:],</span><span class="w"> </span><span class="mi">16</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseUint</span><span class="p">(</span><span class="nx">dval</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">repl</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"go-amd64-emulator REPL"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">help</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">`commands:</span>
<span class="s"> s/step: continue to next instruction</span>
<span class="s"> r/registers [$reg]: print all register values or just $reg</span>
<span class="s"> d/decimal: toggle hex/decimal printing</span>
<span class="s"> m/memory $from $count: print memory values starting at $from until $from+$count</span>
<span class="s"> h/help: print this`</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">help</span><span class="p">)</span>
<span class="w"> </span><span class="nx">scanner</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewScanner</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span>
<span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"%d"</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"> "</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Scan</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Text</span><span class="p">()</span>
<span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">input</span><span class="p">,</span><span class="w"> </span><span class="s">" "</span><span class="p">)</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"h"</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"help"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">help</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"m"</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"memory"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">"Invalid arguments: m/memory $from $to; use hex (0x10), decimal (10), or register name (rsp)"</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">parts</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">resolveDebuggerValue</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">to</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">resolveDebuggerValue</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"memory["</span><span class="o">+</span><span class="nx">intFormat</span><span class="o">+</span><span class="s">":"</span><span class="o">+</span><span class="nx">intFormat</span><span class="o">+</span><span class="s">"]"</span><span class="p">,</span><span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">from</span><span class="o">+</span><span class="nx">to</span><span class="p">),</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">from</span><span class="p">:</span><span class="nx">from</span><span class="o">+</span><span class="nx">to</span><span class="p">])</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"d"</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"decimal"</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"%d"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"0x%x"</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Numbers displayed as hex"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"%d"</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Numbers displayed as decimal"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"r"</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"registers"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">parts</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">registerMap</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">reg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">registerMap</span><span class="p">[</span><span class="nx">reg</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"%s:\t"</span><span class="o">+</span><span class="nx">intFormat</span><span class="o">+</span><span class="s">"\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">reg</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"s"</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"step"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Let's try it out:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main
$<span class="w"> </span>./main<span class="w"> </span>a.out<span class="w"> </span>--debug
go-amd64-emulator<span class="w"> </span>REPL
commands:
<span class="w"> </span>s/step:<span class="w"> </span><span class="k">continue</span><span class="w"> </span>to<span class="w"> </span>next<span class="w"> </span>instruction
<span class="w"> </span>r/registers<span class="w"> </span><span class="o">[</span><span class="nv">$reg</span><span class="o">]</span>:<span class="w"> </span>print<span class="w"> </span>all<span class="w"> </span>register<span class="w"> </span>values<span class="w"> </span>or<span class="w"> </span>just<span class="w"> </span><span class="nv">$reg</span>
<span class="w"> </span>d/decimal:<span class="w"> </span>toggle<span class="w"> </span>hex/decimal<span class="w"> </span>printing
<span class="w"> </span>m/memory<span class="w"> </span><span class="nv">$from</span><span class="w"> </span><span class="nv">$count</span>:<span class="w"> </span>print<span class="w"> </span>memory<span class="w"> </span>values<span class="w"> </span>starting<span class="w"> </span>at<span class="w"> </span><span class="nv">$from</span><span class="w"> </span><span class="k">until</span><span class="w"> </span><span class="nv">$from</span>+<span class="nv">$count</span>
<span class="w"> </span>h/help:<span class="w"> </span>print<span class="w"> </span>this
><span class="w"> </span>r
rax:<span class="w"> </span><span class="m">0</span>
rcx:<span class="w"> </span><span class="m">0</span>
rdx:<span class="w"> </span><span class="m">0</span>
rbx:<span class="w"> </span><span class="m">0</span>
rsp:<span class="w"> </span><span class="m">41943040</span>
rbp:<span class="w"> </span><span class="m">0</span>
rsi:<span class="w"> </span><span class="m">0</span>
rdi:<span class="w"> </span><span class="m">0</span>
r8:<span class="w"> </span><span class="m">0</span>
r9:<span class="w"> </span><span class="m">0</span>
r10:<span class="w"> </span><span class="m">0</span>
r11:<span class="w"> </span><span class="m">0</span>
r12:<span class="w"> </span><span class="m">0</span>
r13:<span class="w"> </span><span class="m">0</span>
r14:<span class="w"> </span><span class="m">0</span>
r15:<span class="w"> </span><span class="m">0</span>
rip:<span class="w"> </span><span class="m">4198662</span>
rflags:<span class="w"> </span><span class="m">0</span>
><span class="w"> </span>m<span class="w"> </span>rip<span class="w"> </span><span class="m">10</span>
memory<span class="o">[</span><span class="m">4198662</span>:4198672<span class="o">]</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>b8<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>5d
><span class="w"> </span>s
><span class="w"> </span>m<span class="w"> </span>rip<span class="w"> </span><span class="m">10</span>
memory<span class="o">[</span><span class="m">4198663</span>:4198673<span class="o">]</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>b8<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>5d<span class="w"> </span>c3
><span class="w"> </span>r
rax:<span class="w"> </span><span class="m">0</span>
rcx:<span class="w"> </span><span class="m">0</span>
rdx:<span class="w"> </span><span class="m">0</span>
rbx:<span class="w"> </span><span class="m">0</span>
rsp:<span class="w"> </span><span class="m">41943032</span>
rbp:<span class="w"> </span><span class="m">0</span>
rsi:<span class="w"> </span><span class="m">0</span>
rdi:<span class="w"> </span><span class="m">0</span>
r8:<span class="w"> </span><span class="m">0</span>
r9:<span class="w"> </span><span class="m">0</span>
r10:<span class="w"> </span><span class="m">0</span>
r11:<span class="w"> </span><span class="m">0</span>
r12:<span class="w"> </span><span class="m">0</span>
r13:<span class="w"> </span><span class="m">0</span>
r14:<span class="w"> </span><span class="m">0</span>
r15:<span class="w"> </span><span class="m">0</span>
rip:<span class="w"> </span><span class="m">4198663</span>
rflags:<span class="w"> </span><span class="m">0</span>
><span class="w"> </span>^D
</pre></div>
<p>Now we can inspect the system interactively.</p>
<h3 id="pop">pop</h3><p>Reemersing in the state of things, we now panic on <code>0x5D</code>.</p>
<div class="highlight"><pre><span></span>./main<span class="w"> </span>a.out
prog:<span class="w"> </span>5d<span class="w"> </span>c3<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>
panic:<span class="w"> </span>Unknown<span class="w"> </span>instruction
goroutine<span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="o">[</span>running<span class="o">]</span>:
main.<span class="o">(</span>*cpu<span class="o">)</span>.loop<span class="o">(</span>0xc000098ae0,<span class="w"> </span>0x2800000<span class="o">)</span>
<span class="w"> </span>/home/phil/tmp/goamd/main.go:219<span class="w"> </span>+0x2c5
main.<span class="o">(</span>*cpu<span class="o">)</span>.run<span class="o">(</span>0xc000098ae0,<span class="w"> </span>0xc000098ab0<span class="o">)</span>
<span class="w"> </span>/home/phil/tmp/goamd/main.go:231<span class="w"> </span>+0xac
created<span class="w"> </span>by<span class="w"> </span>main.main
<span class="w"> </span>/home/phil/tmp/goamd/main.go:358<span class="w"> </span>+0x286
</pre></div>
<p>Looking <a href="http://ref.x86asm.net/coder64.html#x5D">this up</a> we see this
is part of <code>58+r</code>, <code>pop</code>. Similar to
<code>push</code> we subtract <code>0x58</code> from the byte to get
the register to pop onto. The stack operation is the reverse of
<code>push</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mh">0x60</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// pop</span>
<span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0x58</span><span class="p">)</span>
<span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">lhs</span><span class="p">,</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">))</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">+</span><span class="mi">8</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">...</span>
</pre></div>
<p>Build and run for the final panic:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main
$<span class="w"> </span>./main<span class="w"> </span>a.out
prog:<span class="w"> </span>c3<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>
panic:<span class="w"> </span>Unknown<span class="w"> </span>instruction
goroutine<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="o">[</span>running<span class="o">]</span>:
main.<span class="o">(</span>*cpu<span class="o">)</span>.loop<span class="o">(</span>0xc000060c30,<span class="w"> </span>0x2800000<span class="o">)</span>
<span class="w"> </span>/home/phil/tmp/goamd/main.go:224<span class="w"> </span>+0x345
main.<span class="o">(</span>*cpu<span class="o">)</span>.run<span class="o">(</span>0xc000060c30,<span class="w"> </span>0xc000060c00<span class="o">)</span>
<span class="w"> </span>/home/phil/tmp/goamd/main.go:236<span class="w"> </span>+0xac
created<span class="w"> </span>by<span class="w"> </span>main.main
<span class="w"> </span>/home/phil/tmp/goamd/main.go:363<span class="w"> </span>+0x286
</pre></div>
<h3 id="ret">ret</h3><p>Looking up <a href="http://ref.x86asm.net/coder64.html#xC3">0xC3</a> we see that
it is indeed <code>ret</code>. This function's responsibilty is to pop
the stack onto rip, jumping back to caller.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0xC3</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// ret</span>
<span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span>
<span class="w"> </span><span class="nx">retAddress</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">+</span><span class="mi">8</span><span class="p">))</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">retAddress</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main
$<span class="w"> </span>./main<span class="w"> </span>a.out
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">4</span>
</pre></div>
<p>If we modify <code>tests/simple.c</code>?</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/simple.c
int<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">254</span><span class="p">;</span>
<span class="o">}</span>
$<span class="w"> </span>gcc<span class="w"> </span>tests/simple.c
$<span class="w"> </span>./main<span class="w"> </span>a.out<span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">254</span>
</pre></div>
<p>Not bad!</p>
<h3 id="process-and-next-steps">Process and next steps</h3><p>Getting this far took a lot of trial and error, much of it hidden in
this post. Setting up the REPL was critical to debugging mistakes. But
aggressively unit testing would probably have been similarly
fruitful. In the end, the most bug-prone aspects are basic arithmetic
(off by one errors and converting bytes to/from integers). The part
that's not terribly hard is actually interpreting instructions! But
it's made easier by greatly simplifying the problem and ignoring
legion cases.</p>
<p>Along the way it would have been helpful to also disassemble so that
instead of just dumping memory at the instruction pointer we print the
instructions we thought we were going to process. That may be a next
goal.</p>
<p>Otherwise the typical goals are around getting syscall support,
function call support, and porting these simple examples to Windows
and macOS for the experience.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here's take two on writing an emulator for linux/amd64 in Go. This time we're starting with ELF binaries, but still ignoring libc and jumping straight to main.<a href="https://t.co/A87r2RY21c">https://t.co/A87r2RY21c</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1332111601814691840?ref_src=twsrc%5Etfw">November 26, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/emulating-amd64-starting-with-elf.htmlThu, 26 Nov 2020 00:00:00 +0000
- The impact of management teams as a decision-making group, in startups and enterprisehttp://notes.eatonphil.com/the-impact-of-management-teams-on-startups-and-enterprises.html<p>Ambitious companies form management teams at every level above you,
sometimes including you. Management teams meet periodically and have
private chat rooms. They discuss customers, product and organizational
direction. Sometimes discussions are well documented and periodically
public. Sometimes decisions are poorly telegraphed out.</p>
<p>Management teams do no inherent harm in a company with customers;
employees outside of the management team can unearth customer usage
data to discover meaningful places to contribute. For example,
graphing historic server logs to discover slowest requests, figure out
why and how to fix. Or even just paying attention to the most frequent
questions sales asks and finding ways to clarify. (All of this under
the assumption that even when there is solid product direction, good
employees tend to have extra time at work and want to make good use of
it.)</p>
<p>For the first few years even in a well-funded startup with solid
founders, there are few customers. Even under a solid product team,
the product direction is not yet completely clear. The management team
includes founders and non-engineering executives. As a decision making
group they are opaque. Employees outside the management team face a
barrier in finding ways to meaningful contribute. Ambitious, dedicated
folks outside the team leave.</p>
<h3 id="so-what?">So what?</h3><p>It is not clear to me how the natural (and not inherently bad) concept
of management teams attracts and retains ambitious, dedicated
non-founders at small companies. Maybe disenfranchisement is not
important, or even necessary.</p>
<p>Or maybe management teams as a decision-making group are too easily a
substitute for developing a grassroots culture of collaboration and
trust between marketing, sales, product and development.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New post! "management teams as a decision-making group are too easily a substitute for developing a grassroots culture of collaboration and trust between marketing, sales, product and development."<a href="https://t.co/7RukBMI59h">https://t.co/7RukBMI59h</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1328482381314084864?ref_src=twsrc%5Etfw">November 16, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/the-impact-of-management-teams-on-startups-and-enterprises.htmlWed, 11 Nov 2020 00:00:00 +0000
- Standard ML in 2020http://notes.eatonphil.com/standard-ml-in-2020.html<p>Incredibly, Standard ML implementations are still actively
developed. <a href="http://mlton.org/">MLton</a>, <a href="https://polyml.org">Poly/ML</a>,
<a href="https://elsman.com/mlkit/">MLKit</a>,
<a href="https://www.pllab.riec.tohoku.ac.jp/smlsharp/">SML#</a> and
<a href="http://smlnj-gforge.cs.uchicago.edu/scm/viewvc.php/?root=smlnj">SML/NJ</a>
are the most prominent. Discussion on the future direction of Standard
ML <a href="https://github.com/SMLFamily/Successor-ML/issues">remains healthy as
well</a>.</p>
<p>And somehow OCaml's lesser known cousin still beats out OCaml for
multicore threading support (in Poly/ML).</p>
<p>While MLton hasn't merged with
<a href="https://github.com/kayceesrk/multiMLton">MultiMLton</a> or
<a href="https://github.com/UBMLtonGroup/RTMLton">RTMLton</a> to support
multicore, a <a href="https://github.com/mpllang/mpl">new fork of MLton with
parallelism</a> is pretty far along and
in active development at CMU.</p>
<p class="note">
A commentor shared
<a href="https://github.com/ManticoreProject/manticore">Manticore</a>,
another implementation with parallelism support in active
development at UChicago.
</p><p>Furthermore, the last few years have welcomed some entirely new
implementations. <a href="https://github.com/KeenS/webml">WebML</a>, by a
prominent open source hacker, is written in Rust and compiles Standard
ML to WebAssembly. <a href="https://sosml.org/">SOSML</a> is an interpreter
written in TypeScript by former students of Saarland University. It
features <a href="https://sosml.org/editor">a nifty online
IDE</a>.</p>
<p class="note">
A commenter
shared <a href="https://github.com/SomewhatML/sml-compiler">SomewhatML</a>,
an actively developing compiler for Standard ML written in Rust.
</p><p>There have also been some new experimental spins on Standard ML in
the last few years. <a href="https://github.com/julianhyde/morel">Morel</a> is an
interpreter with some nice syntax extensions written in Java by the
author of Apache Calcite. And <a href="https://github.com/elpinal/bright-ml">Bright
ML</a> is a spin on Standard ML and
OCaml written in Standard ML (and using the abandoned <a href="https://mosml.org/">Moscow
ML</a> compiler of all implementations).</p>
<p>So if you're looking for an easy intro to the ML family of languages,
I still recommend the simplicity and performance of Standard ML and
its small but definitely, surprisingly, not dead community. :)</p>
<p>Additional resources:</p>
<ul>
<li><a href="https://smlfamily.github.io/">SML Family Site</a></li>
<li><a href="https://smlfamily.github.io/Basis/index.html">SML Standard Library (Basis Library) Documentation</a></li>
<li><a href="https://reddit.com/r/sml">/r/sml</a></li>
</ul>
<p>Are you using Standard ML? <a href="mailto:[email protected]">Let me know how/why!</a></p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Standard ML implementations are still in active development! There have even been some interesting new implementations pop up in the last few years.<a href="https://t.co/6kOcMKVfQa">https://t.co/6kOcMKVfQa</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1320487302418845696?ref_src=twsrc%5Etfw">October 25, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/standard-ml-in-2020.htmlSun, 25 Oct 2020 00:00:00 +0000
- The case for comments in codehttp://notes.eatonphil.com/the-case-for-comments-in-code.html<p>When I first started programming, especially when asked for code
samples, my comments lacked purpose and would often duplicate in
English what the code clearly indicated. I knew that "commenting is
good" but as a beginner I had no further insight.</p>
<p>Over time with the help of books like Clean Code, I grew disdainful of
comments. Good code should be self-documenting. Whenever I needed to
write a comment to explain something, I'd realize I could easily
rename some key variable or function. I grew more comfortable with
variables and functions with a few words in the title. Better to spend
time on good code structure and naming.</p>
<p class="note">
I have always left TODOs though, since TODOs can't so easily be
expressed in variable names. But even these TODOs concerned me
because they existed in my issue tracker, or maybe should have.
</p><p>As I watched mature open source projects and mature engineers, I came
to value well-documented pull requests. Solid pull requests include or
link to all necessary background, opportunities failed or ignored, how
to use, links to external bugs requiring workarounds and the results
of performance evaluation.</p>
<p>Beyond pull request descriptions, when I really wanted to grease a
pull request I'd use the pull request UI to add comments calling
reviewer attention to key changes in lines of the diff.</p>
<p>Both kinds of guidance are a massive aid to reviewers, saving a lot of
time.</p>
<p>But when I'd find a bug in code -- and I knew there was good pull
request documentation, even for pull requests as recent as six months
ago -- I've been repeatedly failed by the pull request and <em>pull
request comment</em> search exposed by Github and Gitlab.</p>
<p>I <em>knew</em> there were links to documented oddities or bug reports in
pull request threads. But practically speaking, for historic pull
requests, pull request comments are useless.</p>
<p>This is the single biggest reason I've started to push for more
comments in code. More so than all other tools (issue tracker, code
management system, etc.) comments in code have the greatest chance of
still being around and <em>easily searchable</em> if they haven't been
deleted.</p>
<p class="note">
Don't get me started on pull request documentation in an external
medium like Slack. It's so rewarding to get or give instant feedback
on changes on instant messengers, but good luck finding that
discussion 3 months later.
</p><p>Every time I have to call out a line of code in a pull request, that's
immediate cause for that code to be modified with comments.</p>
<p>Maybe I wouldn't do this if Github/Gitlab exposed a Google Docs-like
interface for browsing code line by line with links to all pull
request comment threads.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">The biggest reason to add comments in code (often linking to documented oddities or bug reports) is because it's impossible to search pull request threads historically in every source control management UI I've used.<a href="https://t.co/JlHWfbUH5z">https://t.co/JlHWfbUH5z</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1303130504993136642?ref_src=twsrc%5Etfw">September 8, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/the-case-for-comments-in-code.htmlMon, 07 Sep 2020 00:00:00 +0000
- Writing a simple Python compiler: 1. hello, fibonaccihttp://notes.eatonphil.com/writing-a-simple-python-compiler.html<p>In this post we'll write a Python to C compiler in Python. This is
especially easy to do since Python has a <a href="https://docs.python.org/3/library/ast.html">builtin parser
library</a> and because a
number of <a href="https://docs.python.org/3/c-api/">CPython internals are exposed for extension
writers</a>.</p>
<p>By the end of this post, in a few hundred lines of Python, we'll be able to
compile and run the following program:</p>
<div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">cat</span> <span class="n">tests</span><span class="o">/</span><span class="n">recursive_fib</span><span class="o">.</span><span class="n">py</span>
<span class="k">def</span> <span class="nf">fib</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">return</span> <span class="n">n</span>
<span class="k">return</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="n">fib</span><span class="p">(</span><span class="mi">40</span><span class="p">))</span>
<span class="err">$</span> <span class="n">python3</span> <span class="n">pyc</span> <span class="n">tests</span><span class="o">/</span><span class="n">recursive_fib</span><span class="o">.</span><span class="n">py</span>
<span class="err">$</span> <span class="o">./</span><span class="nb">bin</span><span class="o">/</span><span class="n">a</span><span class="o">.</span><span class="n">out</span>
<span class="mi">102334155</span>
</pre></div>
<p>This post implements an extremely small subset of Python and
<strong>completely gives up on even trying to manage memory</strong> because I
cannot fathom manual reference counting. Maybe some day I'll find a
way to swap in an easy GC like Boehm.</p>
<p><a href="https://github.com/eatonphil/pyc">Source code for this project is available on Github.</a></p>
<h3 id="dependencies">Dependencies</h3><p>We'll need Python3, GCC, libpython3, and clang-format.</p>
<p>On Fedora-based systems:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>dnf<span class="w"> </span>install<span class="w"> </span>gcc<span class="w"> </span>python3-devel<span class="w"> </span>clang-format<span class="w"> </span>python3
</pre></div>
<p>And on Debian-based systems:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>gcc<span class="w"> </span>python3-dev<span class="w"> </span>clang-format<span class="w"> </span>python3
</pre></div>
<p class="note">
This program will likely work as well on Windows, Mac, FreeBSD,
etc. but I haven't gone through the trouble of testing this (or
providing custom compiler directives). Pull requests welcome!
</p><h3 id="a-hand-written-first-pass">A hand-written first-pass</h3><p>Before we get into the compiler, let's write the fibonacci program by
hand in C using libpython.</p>
<p>As described in the <a href="https://docs.python.org/3/extending/embedding.html#very-high-level-embedding">Python embedding
guide</a>
we'll need to include libpython and initialize it in
our <code>main.c</code>:</p>
<div class="highlight"><pre><span></span><span class="cp">#define PY_SSIZE_T_CLEAN</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><Python.h></span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Py_Initialize</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>To compile against libpython, we'll use
<a href="https://helpmanual.io/man1/python3-config/">python3-config</a> installed
as part of <code>python3-devel</code> to tell us what should be linked
at each step during compilation.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>gcc<span class="w"> </span>-c<span class="w"> </span>-o<span class="w"> </span>main.o<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--cflags<span class="k">)</span><span class="w"> </span>main.c
$<span class="w"> </span>gcc<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--ldflags<span class="k">)</span><span class="w"> </span>main.o
$<span class="w"> </span>./a.out<span class="p">;</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">0</span>
</pre></div>
<p>Cool! Now as we think about translating the fibonacci implementation,
we want to keep everything as Python objects for as long as
possible. This means passing and receiving
<a href="https://docs.python.org/3/c-api/object.html">PyObject*</a> to and from
all functions, and converting all C integers to
<a href="https://docs.python.org/3/c-api/long.html">PyLong*</a>, a "subtype" of
<code>PyObject*</code>. You can imagine that everything in Python is
an <code>object</code> until you operate on it.</p>
<p class="note">
For more information on objects in Python, check out
the <a href="https://docs.python.org/3/reference/datamodel.html">Data
model</a> page in Python docs.
</p><p>To map a C integer to a <code>PyLong*</code> we use
<a href="https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong">PyLong_FromLong</a>. To
map in reverse, we use
<a href="https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong">PyLong_AsLong</a>.</p>
<p>To compare two <code>PyObject*</code>s we can use
<a href="https://docs.python.org/3/c-api/object.html#c.PyObject_RichCompareBool">PyObject_RichCompareBool</a>
which will handle the comparison regardless of the type of the two
parameters. Without this helper we'd have to write complex checks to
make sure that the two sides are the same and if they are, unwrap them
into their underlying C value and compare the C value.</p>
<p>We can use
<a href="https://docs.python.org/3/c-api/number.html#c.PyNumber_Add">PyNumber_Add</a>
and
<a href="https://docs.python.org/3/c-api/number.html#c.PyNumber_Subtract">PyNumber_Subtract</a>
for basic arithmetic, and there are many similar helpers available to
us for operations down the line.</p>
<p>Now we can write a translation:</p>
<div class="highlight"><pre><span></span><span class="cp">#define PY_SSIZE_T_CLEAN</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><Python.h></span>
<span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">fib</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">zero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">one</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyObject_RichCompareBool</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">zero</span><span class="p">,</span><span class="w"> </span><span class="n">Py_EQ</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">PyObject_RichCompareBool</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">one</span><span class="p">,</span><span class="w"> </span><span class="n">Py_EQ</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">n</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fib</span><span class="p">(</span><span class="n">PyNumber_Subtract</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">one</span><span class="p">));</span>
<span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">two</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fib</span><span class="p">(</span><span class="n">PyNumber_Subtract</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">two</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyNumber_Add</span><span class="p">(</span><span class="n">left</span><span class="p">,</span><span class="w"> </span><span class="n">right</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Py_Initialize</span><span class="p">();</span>
<span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fib</span><span class="p">(</span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">7</span><span class="p">));</span><span class="w"> </span><span class="c1">// Should be 13</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyLong_AsLong</span><span class="p">(</span><span class="n">res</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Compile and run it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>gcc<span class="w"> </span>-c<span class="w"> </span>-o<span class="w"> </span>main.o<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--cflags<span class="k">)</span><span class="w"> </span>main.c
$<span class="w"> </span>gcc<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--ldflags<span class="k">)</span><span class="w"> </span>main.o
$<span class="w"> </span>./a.out<span class="p">;</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">13</span>
</pre></div>
<p>That's great! But we cheated in one place. We assumed that the input
to the <code>fib</code> function was an integer, and we propagated
that assumption everywhere we wrote <code>PyNumber_*</code>
operations. When we write the compiler, we'll need to check that both
arguments are an integer before we call a numeric helper, otherwise we
may need to call a string concatenation helper or something else
entirely.</p>
<h3 id="compiler-architecture">Compiler Architecture</h3><p>We'll break the code into four major parts:</p>
<ol>
<li><code>libpyc.c</code>: helper functions for generated code</li>
<li><code>pyc/context.py</code>: utilities for scope and writing code in memory</li>
<li><code>pyc/codegen.py</code>: for generating C code from a Python AST</li>
<li><code>pyc/__main__.py</code>: the entrypoint</li>
</ol>
<p class="note">
When I'm writing a new compiler using an existing parser I almost
always start with the entrypoint and code generator so I can explore
the AST. However, it's easiest to explain the code if we start with
the utilities first.
</p><p>We'll also want an empty <code>pyc/__init__.py</code>.</p>
<h3 id="libpyc.c">libpyc.c</h3><p>This C file will contain three helper functions for safely adding,
subtracting, and printing. It will be concatenated to the top of the
generated C file. We'll only support integers for now but this
structure sets us up for supporting more types later on.</p>
<p>We'll use
<a href="https://docs.python.org/3/c-api/long.html#c.PyLong_Check">PyLong_Check</a>
before calling number-specific methods.</p>
<div class="highlight"><pre><span></span><span class="cp">#define PY_SSIZE_T_CLEAN</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><Python.h></span>
<span class="kr">inline</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">PYC_Add</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: allow __add__ override</span>
<span class="w"> </span><span class="c1">// Includes ints and bools</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">l</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">r</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyNumber_Add</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// TODO: handle str, etc.</span>
<span class="w"> </span><span class="c1">// TODO: throw exception</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="kr">inline</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">PYC_Sub</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: allow __add__ override</span>
<span class="w"> </span><span class="c1">// Includes ints and bools</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">l</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">r</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyNumber_Subtract</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// TODO: handle str, etc.</span>
<span class="w"> </span><span class="c1">// TODO: throw exception</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="kr">inline</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">PYC_Print</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">o</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">PyObject_Print</span><span class="p">(</span><span class="n">o</span><span class="p">,</span><span class="w"> </span><span class="n">stdout</span><span class="p">,</span><span class="w"> </span><span class="n">Py_PRINT_RAW</span><span class="p">);</span>
<span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Py_None</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>That's it! We could generate these as strings in Python but it gets
hairy to do so. By using a dedicated C file, we can take advantage of
syntax highlighting since this file is only C code. And since we've
marked all functions as <code>inline</code>, there's no runtime cost
to using not embedding these as strings in Python.</p>
<h3 id="pyc/context.py">pyc/context.py</h3><p>This file will contain a <code>Context</code> class for managing
identifiers in scope and for proxying to a <code>Writer</code> class
that contains helpers for writing lines of C code.</p>
<p>We'll have two instances of the <code>Writer</code> class in
<code>Context</code> so that we can write to a body (or
current/primary) region and an initialization region.</p>
<p>The initialization region is necessary in case there are any variables
declared at the top-level. We can't initialize these variables in C
outside of a function since every <code>PyObject*</code> must be
created after calling <code>Py_Initialize</code>. This section will be
written into our C <code>main</code> function before we enter a
compiled Python <code>main</code> function.</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">copy</span>
<span class="k">class</span> <span class="nc">Writer</span><span class="p">():</span>
<span class="n">content</span> <span class="o">=</span> <span class="s2">""</span>
<span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">indent</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">content</span> <span class="o">+=</span> <span class="p">(</span><span class="s2">" "</span> <span class="o">*</span> <span class="n">indent</span><span class="p">)</span> <span class="o">+</span> <span class="n">exp</span>
<span class="k">def</span> <span class="nf">writeln</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">indent</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">stmt</span> <span class="o">+</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="n">indent</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">write_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">indent</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">writeln</span><span class="p">(</span><span class="n">stmt</span> <span class="o">+</span> <span class="s2">";"</span><span class="p">,</span> <span class="n">indent</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Context</span><span class="p">():</span>
<span class="n">initializations</span> <span class="o">=</span> <span class="n">Writer</span><span class="p">()</span>
<span class="n">body</span> <span class="o">=</span> <span class="n">Writer</span><span class="p">()</span>
<span class="n">indentation</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">scope</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">ret</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">namings</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">counter</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span>
<span class="k">def</span> <span class="fm">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">object</span><span class="p">:</span>
<span class="c1"># Helpers to avoid passing in self.indentation every time</span>
<span class="n">outputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">initializations</span><span class="s2">", "</span><span class="n">body</span><span class="s2">"]</span>
<span class="k">for</span> <span class="n">output</span> <span class="ow">in</span> <span class="n">outputs</span><span class="p">:</span>
<span class="k">if</span> <span class="n">name</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="n">output</span><span class="p">):</span>
<span class="k">return</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">,</span> <span class="n">i</span><span class="o">=</span><span class="kc">None</span><span class="p">:</span> <span class="nb">getattr</span><span class="p">(</span><span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">output</span><span class="p">),</span> <span class="n">name</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">output</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">:])(</span><span class="n">s</span><span class="p">,</span> <span class="n">i</span> <span class="k">if</span> <span class="n">i</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">indentation</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">object</span><span class="o">.</span><span class="fm">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_local</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">source_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-></span> <span class="nb">dict</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">source_name</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">register_global</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">loc</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="n">loc</span><span class="p">,</span>
<span class="s2">"scope"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">register_local</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">local</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">"tmp"</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="bp">self</span><span class="o">.</span><span class="n">counter</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">local</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"name"</span><span class="p">:</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">local</span><span class="si">}</span><span class="s2">_</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">counter</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span>
<span class="c1"># naming dictionary is copied, so we need to capture scope</span>
<span class="c1"># at declaration</span>
<span class="s2">"scope"</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">local</span><span class="p">][</span><span class="s2">"name"</span><span class="p">]</span>
<span class="k">def</span> <span class="nf">copy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="n">new</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="c1"># For some reason copy.deepcopy doesn't do this</span>
<span class="n">new</span><span class="o">.</span><span class="n">namings</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">new</span><span class="o">.</span><span class="n">namings</span><span class="p">)</span>
<span class="k">return</span> <span class="n">new</span>
<span class="k">def</span> <span class="nf">at_toplevel</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope</span> <span class="o">==</span> <span class="mi">0</span>
</pre></div>
<p>This is all pretty boring boilerplate. Let's move on.</p>
<h3 id="pyc/<strong>main</strong>.py">pyc/<strong>main</strong>.py</h3><p>The entrypoint is responsible for reading source code, parsing it,
calling the code generator, writing the source code to a C file, and
compiling it.</p>
<p>First, we read and parse the source code:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">ast</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">shutil</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">context</span> <span class="kn">import</span> <span class="n">Context</span>
<span class="kn">from</span> <span class="nn">codegen</span> <span class="kn">import</span> <span class="n">generate</span>
<span class="n">BUILTINS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"print"</span><span class="p">:</span> <span class="s2">"PYC_Print"</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">target</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">target</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">source</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">tree</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
</pre></div>
<p>Then we write <code>libpyc.c</code> into the body, register builtins,
and run code generation:</p>
<div class="highlight"><pre><span></span><span class="o">...</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">()</span>
<span class="o">...</span>
<span class="n">ctx</span> <span class="o">=</span> <span class="n">Context</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"libpyc.c"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="o">+</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span> <span class="ow">in</span> <span class="n">BUILTINS</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_global</span><span class="p">(</span><span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span><span class="p">)</span>
<span class="n">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">tree</span><span class="p">)</span>
</pre></div>
<p>Next, we create a clean output directory and write
<code>main.c</code> with the generated code and a <code>main</code>
function to initialization Python and any global variables:</p>
<div class="highlight"><pre><span></span><span class="o">...</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="o">...</span>
<span class="c1"># Create and move to working directory</span>
<span class="n">outdir</span> <span class="o">=</span> <span class="s2">"bin"</span>
<span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">outdir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"main.c"</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">ctx</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">content</span><span class="p">)</span>
<span class="n">main</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">namings</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"main"</span><span class="p">)[</span><span class="s2">"name"</span><span class="p">]</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">f</span><span class="s2">"""int main(int argc, char *argv[]) </span><span class="se">{{</span>
<span class="s2"> Py_Initialize();</span>
<span class="s2"> // Initialize globals, if any.</span>
<span class="si">{</span><span class="n">ctx</span><span class="o">.</span><span class="n">initializations</span><span class="o">.</span><span class="n">content</span><span class="si">}</span>
<span class="s2"> PyObject* r = </span><span class="si">{</span><span class="n">main</span><span class="si">}</span><span class="s2">();</span>
<span class="s2"> return PyLong_AsLong(r);</span>
<span class="se">}}</span><span class="s2">"""</span><span class="p">)</span>
</pre></div>
<p>Finally, we run <code>clang-format</code> and <code>gcc</code> against
the generated C code:</p>
<div class="highlight"><pre><span></span><span class="o">...</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="o">...</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"clang-format"</span><span class="p">,</span> <span class="s2">"-i"</span><span class="p">,</span> <span class="s2">"main.c"</span><span class="p">])</span>
<span class="n">cflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">"python3-config"</span><span class="p">,</span> <span class="s2">"--cflags"</span><span class="p">])</span>
<span class="n">cflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">cflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"gcc"</span><span class="p">,</span> <span class="s2">"-c"</span><span class="p">,</span> <span class="s2">"-o"</span><span class="p">,</span> <span class="s2">"main.o"</span><span class="p">]</span> <span class="o">+</span> <span class="n">cflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">"main.c"</span><span class="p">]</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">ldflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">"python3-config"</span><span class="p">,</span> <span class="s2">"--ldflags"</span><span class="p">])</span>
<span class="n">ldflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">ldflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"gcc"</span><span class="p">]</span> <span class="o">+</span> <span class="n">ldflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">"main.o"</span><span class="p">]</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
</pre></div>
<p>All together:</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">ast</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">subprocess</span>
<span class="kn">import</span> <span class="nn">shutil</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="kn">from</span> <span class="nn">context</span> <span class="kn">import</span> <span class="n">Context</span>
<span class="kn">from</span> <span class="nn">codegen</span> <span class="kn">import</span> <span class="n">generate</span>
<span class="n">BUILTINS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s2">"print"</span><span class="p">:</span> <span class="s2">"PYC_Print"</span><span class="p">,</span>
<span class="p">}</span>
<span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
<span class="n">target</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">target</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">source</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">tree</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
<span class="n">ctx</span> <span class="o">=</span> <span class="n">Context</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"libpyc.c"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="o">+</span> <span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span> <span class="ow">in</span> <span class="n">BUILTINS</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">register_global</span><span class="p">(</span><span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span><span class="p">)</span>
<span class="n">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">tree</span><span class="p">)</span>
<span class="c1"># Create and move to working directory</span>
<span class="n">outdir</span> <span class="o">=</span> <span class="s2">"bin"</span>
<span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">outdir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span>
<span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">"main.c"</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">ctx</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">content</span><span class="p">)</span>
<span class="n">main</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">namings</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"main"</span><span class="p">)[</span><span class="s2">"name"</span><span class="p">]</span>
<span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">f</span><span class="s2">"""int main(int argc, char *argv[]) </span><span class="se">{{</span>
<span class="s2"> Py_Initialize();</span>
<span class="s2"> // Initialize globals, if any.</span>
<span class="si">{</span><span class="n">ctx</span><span class="o">.</span><span class="n">initializations</span><span class="o">.</span><span class="n">content</span><span class="si">}</span>
<span class="s2"> PyObject* r = </span><span class="si">{</span><span class="n">main</span><span class="si">}</span><span class="s2">();</span>
<span class="s2"> return PyLong_AsLong(r);</span>
<span class="se">}}</span><span class="s2">"""</span><span class="p">)</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">"clang-format"</span><span class="p">,</span> <span class="s2">"-i"</span><span class="p">,</span> <span class="s2">"main.c"</span><span class="p">])</span>
<span class="n">cflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">"python3-config"</span><span class="p">,</span> <span class="s2">"--cflags"</span><span class="p">])</span>
<span class="n">cflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">cflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"gcc"</span><span class="p">,</span> <span class="s2">"-c"</span><span class="p">,</span> <span class="s2">"-o"</span><span class="p">,</span> <span class="s2">"main.o"</span><span class="p">]</span> <span class="o">+</span> <span class="n">cflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">"main.c"</span><span class="p">]</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">ldflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">"python3-config"</span><span class="p">,</span> <span class="s2">"--ldflags"</span><span class="p">])</span>
<span class="n">ldflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">ldflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">" "</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span>
<span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"gcc"</span><span class="p">]</span> <span class="o">+</span> <span class="n">ldflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">"main.o"</span><span class="p">]</span>
<span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span>
<span class="n">main</span><span class="p">()</span>
</pre></div>
<p>Done!</p>
<h3 id="pyc/codegen.py">pyc/codegen.py</h3><p>Lastly we write the translation layer from Python AST to C. We'll
break this out into 10 helper functions. It is helpful to have the
<a href="https://docs.python.org/3/library/ast.html#abstract-grammar">AST
spec</a> for
reference.</p>
<h4 id="1/10:-generate">1/10: generate</h4><p>The entrypoint of the code generator is <code>generate(ctx: Context,
exp)</code>. It generates code for any object with a <code>body</code>
attribute storing a list of statements. This function will generate
code for objects like modules, function bodies, if bodies, etc.</p>
<p>The statements we'll support to begin are:</p>
<ul>
<li><code>ast.Assign</code></li>
<li><code>ast.FunctionDef</code></li>
<li><code>ast.Return</code></li>
<li><code>ast.If</code></li>
<li>and <code>ast.Expr</code></li>
</ul>
<p>For each statement, we'll simply pass on generation to an associated
helper function. In the case of expression generation though, we'll
also add a noop operation on the result of the expression otherwise
the compiler will complain about an unused variable.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">module</span><span class="p">):</span>
<span class="k">for</span> <span class="n">stmt</span> <span class="ow">in</span> <span class="n">module</span><span class="o">.</span><span class="n">body</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Assign</span><span class="p">):</span>
<span class="n">generate_assign</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">FunctionDef</span><span class="p">):</span>
<span class="n">generate_function_def</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Return</span><span class="p">):</span>
<span class="n">generate_return</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">If</span><span class="p">):</span>
<span class="n">generate_if</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Expr</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">"// noop to hide unused warning"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="s2"> += 0"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unsupported statement type: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">stmt</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</pre></div>
<p class="note">
Remember to throw exceptions aggressively otherwise you'll have a
bad time debugging programs using new syntax.
</p><p>Let's dig into these helpers.</p>
<h4 id="2/10:-generate_assign">2/10: generate_assign</h4><p>To generate assignment code, we need to check if we're at the
top-level or not. If we're at the top-level we can declare the
variable but we can't initialize it yet. So we add the initialization
code to the <code>initialization</code> section of the program.</p>
<p>If we're not at the top-level, we can declare and assign in one
statement.</p>
<p>Before doing either though, we register the variable name so we can
get a safe local name to use in generated code. Then we compile the
right-hand side so we can assign it to the left-hand side.</p>
<div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">ast</span>
<span class="kn">from</span> <span class="nn">context</span> <span class="kn">import</span> <span class="n">Context</span>
<span class="k">def</span> <span class="nf">initialize_variable</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">val</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
<span class="k">if</span> <span class="n">ctx</span><span class="o">.</span><span class="n">at_toplevel</span><span class="p">():</span>
<span class="n">decl</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">"</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="n">decl</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">init</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">val</span><span class="si">}</span><span class="s2">"</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">initializations_write_statement</span><span class="p">(</span><span class="n">init</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">val</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">generate_assign</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">stmt</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Assign</span><span class="p">):</span>
<span class="c1"># TODO: support assigning to a tuple</span>
<span class="n">local</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="n">stmt</span><span class="o">.</span><span class="n">targets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">id</span><span class="p">)</span>
<span class="n">val</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">initialize_variable</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">local</span><span class="p">,</span> <span class="n">val</span><span class="p">)</span>
</pre></div>
<p>We're going to need to implement <code>generate_expression</code> to
make this work.</p>
<h4 id="3/10:-generate_expression">3/10: generate_expression</h4><p>Just like for statements in <code>generate</code>, there are a few
kinds of expressions we need to implement:</p>
<ul>
<li><code>ast.Num</code></li>
<li><code>ast.BinOp</code></li>
<li><code>ast.BoolOp</code></li>
<li><code>ast.Name</code></li>
<li><code>ast.Compare</code></li>
<li>and <code>ast.Call</code></li>
</ul>
<p>For <code>ast.Num</code>, we just need to wrap the literal number as a
<code>PyLong*</code>. And for <code>ast.Name</code> we just need to
look up the local name in context. Otherwise we delegate to more
helper functions.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Num</span><span class="p">):</span>
<span class="c1"># TODO: deal with non-integers</span>
<span class="n">tmp</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">"num"</span><span class="p">)</span>
<span class="n">initialize_variable</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">tmp</span><span class="p">,</span> <span class="sa">f</span><span class="s2">"PyLong_FromLong(</span><span class="si">{</span><span class="n">exp</span><span class="o">.</span><span class="n">n</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">tmp</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">BinOp</span><span class="p">):</span>
<span class="k">return</span> <span class="n">generate_bin_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">BoolOp</span><span class="p">):</span>
<span class="k">return</span> <span class="n">generate_bool_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Name</span><span class="p">):</span>
<span class="k">return</span> <span class="n">ctx</span><span class="o">.</span><span class="n">get_local</span><span class="p">(</span><span class="n">exp</span><span class="o">.</span><span class="n">id</span><span class="p">)[</span><span class="s2">"name"</span><span class="p">]</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Compare</span><span class="p">):</span>
<span class="k">return</span> <span class="n">generate_compare</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Call</span><span class="p">):</span>
<span class="k">return</span> <span class="n">generate_call</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unsupported expression: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">exp</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</pre></div>
<p>For every code generation helper that is an expression, we store the
expression in a local variable and return the variable's name so that
parent nodes in the AST can refer to the child. This can result in
inefficient code generation (useless assignment) but that's not really
a big deal for a project like this and will likely be optimized away
by GCC anyway. The more annoying aspect is that useless assignment
just makes the generated code harder to read.</p>
<h4 id="4/10:-generate_bin_op">4/10: generate_bin_op</h4><p>For binary operators we need to support addition and
subtraction. Other binary operators like equality or and/or are
represented in <code>ast.Compare</code> and <code>ast.BoolOp</code>.</p>
<p>This is easy to write because we already prepared helpers in
<code>libpyc.c</code>: <code>PYC_Sub</code> and <code>PYC_Add</code>.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_bin_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">binop</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">BinOp</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">"binop"</span><span class="p">)</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">binop</span><span class="o">.</span><span class="n">left</span><span class="p">)</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">binop</span><span class="o">.</span><span class="n">right</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binop</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Add</span><span class="p">):</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PYC_Add(</span><span class="si">{</span><span class="n">l</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binop</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Sub</span><span class="p">):</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PYC_Sub(</span><span class="si">{</span><span class="n">l</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unsupported binary operator: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">binop</span><span class="o">.</span><span class="n">op</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<p>Easy enough.</p>
<h4 id="5/10:-generate_bool_op">5/10: generate_bool_op</h4><p>We only need to support <code>or</code> for the fibonacci program, but
<code>or</code> in Python is more complicated than in C. In Python,
the first value to be truthy short-circuits the expression and the
result is its value, not <code>True</code>.</p>
<p>We'll use <code>goto</code> to short-circuit and we'll use
<a href="https://docs.python.org/3/c-api/object.html#c.PyObject_IsTrue">PyObject_IsTrue</a>
to do the truthy check:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_bool_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">boolop</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">BoolOp</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">"boolop"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">boolop</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Or</span><span class="p">):</span>
<span class="n">done_or</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">"done_or"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">exp</span> <span class="ow">in</span> <span class="n">boolop</span><span class="o">.</span><span class="n">values</span><span class="p">:</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">"if (PyObject_IsTrue(</span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">)) </span><span class="se">{{</span><span class="s2">"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"goto </span><span class="si">{</span><span class="n">done_or</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="n">ctx</span><span class="o">.</span><span class="n">indentation</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">"}"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">done_or</span><span class="si">}</span><span class="s2">:</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<p class="note">
Now that I write this down I see we could probably move this
function into <code>libpyc.c</code> if we used a loop. Maybe in
the next iteration.
</p><p>We move on.</p>
<h4 id="6/10:-generate_compare">6/10: generate_compare</h4><p>This function handles equality and inequality checks. We'll adapt the
<code>PyObject_RichCompareBool</code> helper we used in the
hand-written translation.</p>
<p>The only additional thing to keep in mind is that the right-hand side
is passed as an array. So we have to iterate through it and apply the
equality/inequality check on all objects in the list.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_compare</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Compare</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">"compare"</span><span class="p">)</span>
<span class="n">left</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">left</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">left</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">op</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">exp</span><span class="o">.</span><span class="n">ops</span><span class="p">):</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">comparators</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Eq</span><span class="p">):</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PyObject_RichCompare(</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">, Py_EQ)"</span><span class="p">)</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">NotEq</span><span class="p">):</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PyObject_RichCompare(</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">, Py_NE)"</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unsupported comparison: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">result</span>
</pre></div>
<h4 id="7/10:-generate_call">7/10: generate_call</h4><p>The last expression is simple enough. We compile the call's arguments
first, then the function itself, then we call the function with the
arguments like any C function. Calling the C function directly will
have ramifications for interacting with Python libraries (basically,
we won't be able to interact with any) but it's the easiest way to get
started.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_call</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Call</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span>
<span class="n">args</span> <span class="o">=</span> <span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">exp</span><span class="o">.</span><span class="n">args</span><span class="p">])</span>
<span class="n">fun</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">func</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">"call_result"</span><span class="p">)</span>
<span class="c1"># TODO: lambdas and closures need additional work</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span>
<span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">res</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">fun</span><span class="si">}</span><span class="s2">(</span><span class="si">{</span><span class="n">args</span><span class="si">}</span><span class="s2">)"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">res</span>
</pre></div>
<p>And that's it for expressions! Just a few more statement helpers to
support.</p>
<h4 id="8/10:-generate_function_def">8/10: generate_function_def</h4><p>This is a fun one. First we register the function name in scope. Then
we copy the context so variables within the function body are
contained within the function body. We increment <code>scope</code> so
we know we've left the top-level. Finally, we compile the body.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_function_def</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">fd</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">FunctionDef</span><span class="p">):</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="n">fd</span><span class="o">.</span><span class="n">name</span><span class="p">)</span>
<span class="n">childCtx</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">args</span> <span class="o">=</span> <span class="s2">", "</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">childCtx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">arg</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">fd</span><span class="o">.</span><span class="n">args</span><span class="o">.</span><span class="n">args</span><span class="p">])</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">"PyObject* </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">(</span><span class="si">{</span><span class="n">args</span><span class="si">}</span><span class="s2">) </span><span class="se">{{</span><span class="s2">"</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">childCtx</span><span class="o">.</span><span class="n">scope</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">childCtx</span><span class="o">.</span><span class="n">indentation</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">generate</span><span class="p">(</span><span class="n">childCtx</span><span class="p">,</span> <span class="n">fd</span><span class="p">)</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">childCtx</span><span class="o">.</span><span class="n">ret</span><span class="p">:</span>
<span class="n">childCtx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="s2">"return Py_None"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">"}</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</pre></div>
<p>The check for <code>childCtx.ret</code> isn't strictly necessary
because we could just emit a return even if there already was
one. Asking <code>generate_return</code> to set this attribute and
having <code>generate_function_def</code> check it just makes the
generate code a little prettier.</p>
<h4 id="9/10:-generate_return">9/10: generate_return</h4><p>Very straightforward, we just compile the value to be returned and
then we emit a <code>return</code> statement.</p>
<p>We store the return value so that the function definition can know
whether to add a <code>return PyNone</code> statement.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_return</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">r</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Return</span><span class="p">):</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">ret</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">""</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">"return </span><span class="si">{</span><span class="n">ctx</span><span class="o">.</span><span class="n">ret</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</pre></div>
<p>And we've got one last statement to support!</p>
<h4 id="10/10:-generate_if">10/10: generate_if</h4><p>You know the deal: compile the test and if the test is truthy, enter
the compiled body. We'll deal with the else body another time.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_if</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">If</span><span class="p">):</span>
<span class="n">test</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">test</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">"if (PyObject_IsTrue(</span><span class="si">{</span><span class="n">test</span><span class="si">}</span><span class="s2">)) </span><span class="se">{{</span><span class="s2">"</span><span class="p">)</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">indentation</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span>
<span class="c1"># TODO: handle exp.orelse</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">indentation</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">"}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</pre></div>
<p>And we're done the compiler!</p>
<h3 id="trying-it-out">Trying it out</h3><p>As promised:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/recursive_fib.py
def<span class="w"> </span>fib<span class="o">(</span>n<span class="o">)</span>:
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>or<span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span>:
<span class="w"> </span><span class="k">return</span><span class="w"> </span>n
<span class="w"> </span><span class="k">return</span><span class="w"> </span>fib<span class="o">(</span>n<span class="w"> </span>-<span class="w"> </span><span class="m">1</span><span class="o">)</span><span class="w"> </span>+<span class="w"> </span>fib<span class="o">(</span>n<span class="w"> </span>-<span class="w"> </span><span class="m">2</span><span class="o">)</span>
def<span class="w"> </span>main<span class="o">()</span>:
<span class="w"> </span>print<span class="o">(</span>fib<span class="o">(</span><span class="m">40</span><span class="o">))</span>
$<span class="w"> </span>python3<span class="w"> </span>pyc<span class="w"> </span>tests/recursive_fib.py
$<span class="w"> </span>./bin/a.out
<span class="m">102334155</span>
</pre></div>
<h4 id="microbenchmarking,-or-making-compiler-twitter-unhappy">Microbenchmarking, or making compiler Twitter unhappy</h4><p>Keep in mind this implementation does a small fraction of what CPython
is doing.</p>
<p>If you time the generated code:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>python3<span class="w"> </span>pyc<span class="w"> </span>tests/recursive_fib.py
$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>./bin/a.out
<span class="m">102334155</span>
./bin/a.out<span class="w"> </span><span class="m">18</span>.69s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.03s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">18</span>.854<span class="w"> </span>total
</pre></div>
<p>And CPython (with <code>main()</code> append to the source):</p>
<div class="highlight"><pre><span></span><span class="nb">time</span><span class="w"> </span>python3<span class="w"> </span>tests/recursive_fib.py
<span class="m">102334155</span>
python3<span class="w"> </span>tests/recursive_fib.py<span class="w"> </span><span class="m">76</span>.24s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.11s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">1</span>:16.81<span class="w"> </span>total
</pre></div>
<p>The only reason I mention this is because when I did a <a href="/compiling-dynamic-programming-languages.html#next-steps-with-jsc">similar
compiler project for JavaScript targeting
C++/libV8</a>,
the generated code was about the same or a little slower in speed.</p>
<p>I haven't gotten <em>that much</em> better at writing these compilers.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post up, on writing a simple Python to C compiler (in Python).<a href="https://t.co/4kkji0XXbp">https://t.co/4kkji0XXbp</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1295134027335204865?ref_src=twsrc%5Etfw">August 16, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/writing-a-simple-python-compiler.htmlSun, 16 Aug 2020 00:00:00 +0000
- A single-node Kubernetes cluster without virtualization or a container registryhttp://notes.eatonphil.com/a-single-node-kubernetes-cluster-without-virtualization-or-a-container-registry.html<p>This post is a recipe for setting up a minimal Kubernetes cluster on
Fedora without requiring virtualization or a container registry. These
two features make the system cloud-agnostic and the cluster entirely
self-contained. The post will end with us running a simple Flask app
from a local container.</p>
<p>This setup is primarily useful for simple CI environments or
application development on Linux. (Docker Desktop has better tooling
for development on Mac or Windows.)</p>
<h3 id="getting-kubernetes">Getting Kubernetes</h3><p>The core of this effort is <a href="https://k3s.io/">K3s</a>, a Kubernetes
distribution that allows us to run on a single node without
virtualization.</p>
<p>But first off, <a href="https://docs.docker.com/engine/install/fedora/">install Docker</a>.</p>
<p>Then install K3s:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-sfL<span class="w"> </span>https://get.k3s.io<span class="w"> </span><span class="p">|</span><span class="w"> </span>sh<span class="w"> </span>-
</pre></div>
<p>It may prompt you to adjust some SELinux policies like so:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>dnf<span class="w"> </span>install<span class="w"> </span>-y<span class="w"> </span>container-selinux<span class="w"> </span>selinux-policy-base
$<span class="w"> </span>sudo<span class="w"> </span>rpm<span class="w"> </span>-i<span class="w"> </span>https://rpm.rancher.io/k3s-selinux-0.1.1-rc1.el7.noarch.rpm
</pre></div>
<p>Swap these out with whatever it prompts and retry the K3s install.</p>
<p>Finally, <a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/">install
kubectl</a>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-LO<span class="w"> </span>https://storage.googleapis.com/kubernetes-release/release/<span class="sb">`</span>curl<span class="w"> </span>-s<span class="w"> </span>https://storage.googleapis.com/kubernetes-release/release/stable.txt<span class="sb">`</span>/bin/linux/amd64/kubectl
</pre></div>
<p>Now copy the global K3s kubeconfig into <code>~/.kube/config</code>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>cp<span class="w"> </span>/etc/rancher/k3s/k3s.yaml<span class="w"> </span>~/.kube/config
$<span class="w"> </span>sudo<span class="w"> </span>chown<span class="w"> </span><span class="nv">$USER</span>:<span class="nv">$GROUP</span><span class="w"> </span>~/.kube/config
</pre></div>
<p>And enable K3s:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>systemctl<span class="w"> </span><span class="nb">enable</span><span class="w"> </span>k3s
</pre></div>
<p>If you're on Fedora 31+ you'll need to disable cgroups v2 and reboot:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>grubby<span class="w"> </span>--args<span class="o">=</span><span class="s2">"systemd.unified_cgroup_hierarchy=0"</span><span class="w"> </span>--update-kernel<span class="o">=</span>ALL
$<span class="w"> </span>sudo<span class="w"> </span>reboot
</pre></div>
<p>Finally, you can run <code>kubectl</code>:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>kubectl<span class="w"> </span>get<span class="w"> </span>pods
No<span class="w"> </span>resources<span class="w"> </span>found<span class="w"> </span><span class="k">in</span><span class="w"> </span>default<span class="w"> </span>namespace.
</pre></div>
<h3 id="a-simple-application">A simple application</h3><p>We'll create a small Flask app, containerize it, and write a
Kubernetes deployment and service config for it.</p>
<p>We begin with <code>app.py</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span>
<span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span>
<span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">index</span><span class="p">():</span>
<span class="k">return</span> <span class="s1">'Hello World, Flask!'</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="n">app</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">debug</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre></div>
<p>Then a <code>Dockerfile</code>:</p>
<div class="highlight"><pre><span></span><span class="k">FROM</span><span class="w"> </span><span class="s">python:3-slim</span>
<span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>flask
<span class="k">COPY</span><span class="w"> </span>.<span class="w"> </span>/app
<span class="k">CMD</span><span class="w"> </span>python3<span class="w"> </span>/app/app.py
</pre></div>
<p>Then the deployment in <code>manifest.yaml</code>:</p>
<div class="highlight"><pre><span></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">apps/v1</span>
<span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Deployment</span>
<span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span>
<span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">selector</span><span class="p">:</span>
<span class="w"> </span><span class="nt">matchLabels</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span>
<span class="w"> </span><span class="nt">template</span><span class="p">:</span>
<span class="w"> </span><span class="nt">metadata</span><span class="p">:</span>
<span class="w"> </span><span class="nt">labels</span><span class="p">:</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span>
<span class="w"> </span><span class="nt">spec</span><span class="p">:</span>
<span class="w"> </span><span class="nt">containers</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span>
<span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span>
</pre></div>
<h3 id="running-in-kubernetes">Running in Kubernetes</h3><p>First we build, save, and import the image into <code>k3s</code>:</p>
<div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">docker</span> <span class="n">build</span> <span class="o">.</span> <span class="o">-</span><span class="n">t</span> <span class="n">helloworld</span>
<span class="err">$</span> <span class="n">docker</span> <span class="n">save</span> <span class="n">helloworld</span> <span class="o">></span> <span class="n">helloworld</span><span class="o">.</span><span class="n">tar</span>
<span class="err">$</span> <span class="n">sudo</span> <span class="n">k3s</span> <span class="n">ctr</span> <span class="n">image</span> <span class="kn">import</span> <span class="nn">helloworld.tar</span>
<span class="err">$</span> <span class="n">kubectl</span> <span class="n">apply</span> <span class="o">-</span><span class="n">f</span> <span class="o">./</span><span class="n">manifest</span><span class="o">.</span><span class="n">yaml</span>
<span class="err">$</span> <span class="n">kubectl</span> <span class="n">port</span><span class="o">-</span><span class="n">forward</span> <span class="err">$</span><span class="p">(</span><span class="n">kubectl</span> <span class="n">get</span> <span class="n">pods</span> <span class="o">|</span> <span class="n">grep</span> <span class="n">helloworld</span> <span class="o">|</span> <span class="n">cut</span> <span class="o">-</span><span class="n">d</span> <span class="s1">' '</span> <span class="o">-</span><span class="n">f</span> <span class="mi">1</span><span class="p">)</span> <span class="mi">5000</span> <span class="o">></span> <span class="n">log</span> <span class="mi">2</span><span class="o">>&</span><span class="mi">1</span> <span class="o">&</span>
<span class="err">$</span> <span class="n">curl</span> <span class="n">localhost</span><span class="p">:</span><span class="mi">5000</span>
<span class="n">Hello</span> <span class="n">World</span><span class="p">,</span> <span class="n">Flask</span>
</pre></div>
<p>And that's it!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post is a recipe for creating a self-contained, single-node Kubernetes cluster for CI environments using a basic Flask app.<a href="https://t.co/fegAZFEQzO">https://t.co/fegAZFEQzO</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1287163839306444800?ref_src=twsrc%5Etfw">July 25, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/a-single-node-kubernetes-cluster-without-virtualization-or-a-container-registry.htmlSat, 25 Jul 2020 00:00:00 +0000
- Generating a full-stack application from a databasehttp://notes.eatonphil.com/generating-a-full-stack-application-from-a-database.html<p><a href="https://dbcore.org">DBCore</a> can now generate a TypeScript/React CRUD
UI that is automatically hooked up to the generated REST API (in Go).</p>
<p>The UI has full support for login, viewing (and filtering), editing,
and creating database entities.</p>
<p>PostgreSQL, SQLite and MySQL are supported.</p>
<h3 id="how-to-use?">How to use?</h3><p>The goal of this project is primarily to provide as much useful
boilerplate as possible for full-stack applications. The system is
probably not sufficient to be an entire application development
platform. It's currently missing hooks, overrides, and
per-row/per-table authorization.</p>
<p>The UI code generation may be even less useful in the long-term than
the API because UIs are by necessity very diverse. But it is good not
to need to build the same browser-side API, authentication, and
routing logic again now that it's taken care of in code generation.</p>
<h3 id="screenshots">Screenshots</h3><p>Here are a few screenshots of the examples/todo application. Every
page here is auto-generated after reading the database schema. The
browser application is hooked up to the similarly auto-generated API.</p>
<div style="padding-bottom: 15px;">
<small>Sign in</small>
<img style="border: 1px solid #ddd;" src="https://i.imgur.com/1ReEEdf.png"/>
</div>
<div style="padding-bottom: 15px;">
<small>Creating a table entity</small>
<img style="border: 1px solid #ddd;" src="https://i.imgur.com/AiryzjX.png"/>
</div>
<div style="padding-bottom: 15px;">
<small>Viewing all table entities</small>
<img style="border: 1px solid #ddd;" src="https://i.imgur.com/l9jI0LA.png"/>
</div>
<div style="padding-bottom: 15px;">
<small>Filtering table entities</small>
<img style="border: 1px solid #ddd;" src="https://i.imgur.com/J21vQDE.png"/>
</div>
<div style="padding-bottom: 15px;">
<small>Viewing an individual table entity</small>
<img style="border: 1px solid #ddd;" src="https://i.imgur.com/T2VhBFt.png"/>
</div>
<div>
<small>Editing a table entity</small>
<img style="border: 1px solid #ddd;" src="https://i.imgur.com/f2sRN1p.png">
</div><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">What's new in DBCore: a TypeScript/React UI generated from your database schema and hooked up to the similarly generated Go REST API.<br><br>So you can now generate an entire full stack application from your database schema. Screenshots in the post.<a href="https://t.co/BTGRVBsfUR">https://t.co/BTGRVBsfUR</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1272295312900661250?ref_src=twsrc%5Etfw">June 14, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/generating-a-full-stack-application-from-a-database.htmlSun, 14 Jun 2020 00:00:00 +0000
- Generating a REST API from a databasehttp://notes.eatonphil.com/generating-a-rest-api-from-a-database.html<p>I recently published an <a href="https://eatonphil.github.io/dbcore/">alpha version of a code generation tool,
DBCore,</a> that reads a database
schema from PostgreSQL or MySQL and generates an entire Go API with
CRUD operations, pagination, filtering, and authentication.</p>
<p><img src="https://pbs.twimg.com/media/EZJ7TvNXQAEgraD?format=png&name=large" /></p>
<p>But more than just generating code like
<a href="https://github.com/xo/xo">xo/xo</a> or <a href="https://gnorm.org/">gnorm</a>,
DBCore defines a standard REST API that can be implemented in any
language -- and includes a reference implementation in Go. I'm eager
to add Java and Ruby implementations as well. And I'd be more than
happy to accept community contributions.</p>
<h3 id="boilerplate-&-code-generation">Boilerplate & code generation</h3><p>Web application boilerplate is boring. You should do it once from
scratch (preferably down to the socket layer) and never do it again. I
struggled for the last few years to find the right system to reduce
boilerplate. If I were building a new line-of-business application as
an employee I'd pick one of Rails, ASP.NET, Spring, Django, or
similar.</p>
<p>I've never worked on one of those frameworks professionally and I've
never been able to force myself to learn any of them in my free
time. But even if I could use one of these, none of them get close to
giving you an entire functioning application with authentication,
pagination, filtering all based on your existing database.</p>
<p>Over the last few years though I've relied heavily on code generation
for Go projects. Code generation is basically the only way to conserve
type-safe code in Go. But it's similarly
<a href="https://www.jooq.org/doc/3.13/manual/code-generation/">popular</a> in
more powerful languages like Java.</p>
<p>However none of the existing projects give you much flexibility or
provide you with enough templates to be useful.</p>
<h3 id="dbcore">DBCore</h3><p>DBCore is written in F# and can be distributed as a static
binary on all systems .NET now supports (read: not just Windows!).</p>
<p>Reading from MySQL or PostgreSQL is supported but I'd like to see that
extended to include SQLite, Oracle, and MS SQL at least.</p>
<p>As mentioned, currently DBCore only provides a Go REST API
template. That only solves half the problem of building an application
though. And while there are some projects that can generate an admin
CRUD interface for you, I want to see that more tightly integrated
into DBCore. So I'll be introducing a new template for a browser
application as well. For each table in the database it will generate a
page showing paginated entries and allow you to create, update, and
delete.</p>
<p>Finally, while the tool only currently has a concept of "browser" and
"api" templates, the project should be able to accept any kind of
template and generate any text based on any database schema.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New blog post, background and goals for dbcore<a href="https://t.co/XW9gUCtvr0">https://t.co/XW9gUCtvr0</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1269467766727327745?ref_src=twsrc%5Etfw">June 7, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/generating-a-rest-api-from-a-database.htmlSat, 06 Jun 2020 00:00:00 +0000
- RFCs and asynchronous-first culturehttp://notes.eatonphil.com/rfcs-and-asynchronous-first-culture.html<p>I hated writing documentation before working on features. But after a
while I realized I couldn't communicate well enough, even with folks I
had a good connection with. It took me a number of mistaken deliveries
to get the message.</p>
<h3 id="sketches-and-mockups">Sketches and mockups</h3><p>Designers solve this by producing low-fidelity sketches early on in
the process, iterating on feedback to produce a high-fidelity
mockup. I solve this by producing short RFC (request for comment)
documents. This isn't an original idea, but I see it so rarely I
wanted to share.</p>
<p>Now as soon as I begin thinking about a technical or organizational
change, I write an RFC. My RFCs are typically a page or two long and
typically take me 30-60 minutes for a good first draft. I make
clear in the title that it is a proposal or draft. This allows me to
make crazy suggestions without upsetting folks; a draft can be easily
thrown away.</p>
<h3 id="rfc-process">RFC process</h3><p>My RFCs include three key sections:</p>
<ol>
<li>What I think the problem is</li>
<li>Pros/cons of all the solutions I considered</li>
<li>Which solution I'm planning to go with if no one responds to the RFC</li>
</ol>
<p>After I write the first draft I circulate it among a small group of peers
I respect, my boss, etc. I request feedback at leisure and I check in
every few days with a reminder. If no one responds after a while and
there is little concern, I typically move forward with the proposed
solution.</p>
<p>In addition to clarifying intent up front, this removes the need to
schedule a meeting to <em>discuss a problem</em>. Discussion and
decisions can be held asynchronously. I only schedule a meeting if
there is disagreement that is unable to be resolved in writing.</p>
<p>After incorporating feedback, I either throw away the RFC and move on
or feel reasonably confident about the proposal. I send it out to a
wider group of relevant participants. Final meetings are held as
needed.</p>
<h3 id="the-other-option">The other option</h3><p>In contrast, synchronous-first and undocumented proposals make some
sense when you've got a small team in the same timezone with a similar
schedule. Otherwise, you repeatedly reschedule meetings to accommodate
everyone. You spend your first few meetings simply coming to
understand and agree on <em>the problem</em>.</p>
<p>Spending 30-60 minutes to draft a proposal is almost always easier. It
makes the decision-making process faster and produces more accurate
results.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Spending 30-60 minutes to draft a technical (or organizational) proposal is almost always easier for discussion and action than just scheduling a meeting. Or "my asynchronous-first manifesto"<a href="https://t.co/gm4SUzBD2W">https://t.co/gm4SUzBD2W</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1261767623592869896?ref_src=twsrc%5Etfw">May 16, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/rfcs-and-asynchronous-first-culture.htmlSat, 16 May 2020 00:00:00 +0000
- Writing a SQL database from scratch in Go: 4. a database/sql driverhttp://notes.eatonphil.com/database-basics-a-database-sql-driver.html<p class="note">
Previously in database basics:
<! forgive me, for I have sinned >
<br />
<a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a>
<br />
<a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a>
<br />
<a href="/database-basics-indexes.html">3. indexes</a>
</p><p>In this post, we'll extend <a href="https://github.com/eatonphil/gosql">gosql</a>
to implement the <code>database/sql</code> driver interface. This will
allow us to interact with gosql the same way we would interact with
any other database.</p>
<p>Here is an example familiar program (stored in
<code>cmd/sqlexample/main.go</code>) we'll be able to run:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"database/sql"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="s">"github.com/eatonphil/gosql"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">sql</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">"postgres"</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">"CREATE TABLE users (name TEXT, age INT);"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">"INSERT INTO users VALUES ('Terry', 45);"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">"INSERT INTO users VALUES ('Anette', 57);"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">"SELECT name, age FROM users;"</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Scan</span><span class="p">(</span><span class="o">&</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">age</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Name: %s, Age: %d\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">age</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Err</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Our gosql driver will use a single instance of the
<code>Backend</code> for all connections.</p>
<p>Aside from that, it is a simple matter of wrapping our existing APIs
in structs that implement the <code>database/sql/driver.Driver</code>
interface.</p>
<p>This post is largely a discussion of <a href="https://github.com/eatonphil/gosql/commit/0d0aa61a74580a6aef11296741abfba4e1d4ae5c">this
commit</a>.</p>
<h3 id="implementing-the-driver">Implementing the driver</h3><p>A driver is registered by calling <code>sql.Register</code> with a
driver instance.</p>
<p>We'll add the registration code to an <code>init</code> function in a
new file, <code>driver.go</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">struct</span><span class="w"> </span><span class="nx">Driver</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bkd</span><span class="w"> </span><span class="nx">Backend</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">sql</span><span class="p">.</span><span class="nx">Register</span><span class="p">(</span><span class="s">"postgres"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">Driver</span><span class="p">{</span><span class="nx">NewMemoryBackend</span><span class="p">()})</span>
<span class="p">}</span>
</pre></div>
<p>According to the <a href="https://pkg.go.dev/database/sql/driver?tab=doc#Driver">Driver
interface</a>, we
need only implement <code>Open</code> to return an connection instance
that implements the <code>database/sql/driver.Conn</code> interface.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Driver</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bkd</span><span class="w"> </span><span class="nx">Backend</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Driver</span><span class="p">)</span><span class="w"> </span><span class="nx">Open</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Conn</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Conn</span><span class="p">{</span><span class="nx">d</span><span class="p">.</span><span class="nx">bkd</span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">sql</span><span class="p">.</span><span class="nx">Register</span><span class="p">(</span><span class="s">"postgres"</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">Driver</span><span class="p">{</span><span class="nx">NewMemoryBackend</span><span class="p">()})</span>
<span class="p">}</span>
</pre></div>
<h3 id="implementing-the-connection">Implementing the connection</h3><p>According to the <a href="https://pkg.go.dev/database/sql/driver?tab=doc#Conn">Conn
interface</a>, we
must implement:</p>
<ul>
<li><code>Prepare(query string) (driver.Stmt, error)</code> to handle prepared statements</li>
<li><code>Close</code> to handle cleanup</li>
<li>and <code>Begin</code> to start a transaction</li>
</ul>
<p>The connection can also optionally implement <code>Query</code> and
<code>Exec</code>.</p>
<p>To simplify things we'll panic on <code>Prepare</code> and on
<code>Begin</code> (we don't have transactions yet). There's no
cleanup required so we'll do nothing in <code>Close</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Conn</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">bkd</span><span class="w"> </span><span class="nx">Backend</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Prepare</span><span class="p">(</span><span class="nx">query</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Prepare not implemented"</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Begin</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Tx</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Begin not implemented"</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Close</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>The only method we actually need, <code>Query</code>, is not required
by the interface. It takes a query string and array of query
parameters, returning an instance implementing
the <code>database/sql/driver.Rows</code> interface.</p>
<p>To implement <code>Query</code>, we basically copy the logic we had in
the <code>cmd/main.go</code> REPL. The only change is that when we
return results when handling <code>SELECT</code>, we'll return a
struct that implements the <code>database/sql/driver.Rows</code>
interface.</p>
<p class="note">
<code>database/sql/driver.Rows</code> is not the same type as
<code>database/sql.Rows</code>, which may sound more
familiar. <code>database/sql/driver.Rows</code> is a simpler,
lower-level interface.
</p><p>If we receive parameterized query arguments, we'll ignore them for
now. And if the query involves multiple statements, we'll process only
the first statement.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Query</span><span class="p">(</span><span class="nx">query</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Rows</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: support parameterization</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Parameterization not supported"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">Parser</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">query</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error while parsing: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// NOTE: ignorning all but the first statement</span>
<span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">CreateIndexKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">CreateIndex</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">CreateIndexStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error adding index on table: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">CreateTableKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">CreateTableStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error creating table: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">DropTableKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">DropTable</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">DropTableStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error dropping table: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">InsertKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">InsertStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Error inserting values: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">SelectKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">SelectStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Rows</span><span class="p">{</span>
<span class="w"> </span><span class="nx">rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Rows</span><span class="p">,</span>
<span class="w"> </span><span class="nx">columns</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Columns</span><span class="p">,</span>
<span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h3 id="implementing-results">Implementing results</h3><p>According to the <a href="https://pkg.go.dev/database/sql/driver?tab=doc#Rows">Rows
interface</a> we
must implement:</p>
<ul>
<li><code>Columns() []string</code> to return an array of columns names</li>
<li><code>Next(dest []Value) error</code> to populate an row array with the next row's worth of cells</li>
<li>and <code>Close() error</code></li>
</ul>
<p>Our <code>Rows</code> struct will contain the rows and colums as
returned from <code>Backend</code>, and will also contain an
<code>index</code> field we can use in <code>Next</code> to populate
the next row of cells.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Rows</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="nx">ResultColumn</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kt">uint64</span>
<span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Columns</span><span class="p">()</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Close</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Next</span><span class="p">(</span><span class="nx">dest</span><span class="w"> </span><span class="p">[]</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{}</span>
</pre></div>
<p>For <code>Columns</code> we simply need to extract and
return the column names from <code>ResultColumn</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Columns</span><span class="p">()</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Name</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">columns</span>
<span class="p">}</span>
</pre></div>
<p>For <code>Next</code> we need to iterate over each cell in the current
row and retrieve its Go value, storing it in <code>dest</code>. The
<code>dest</code> argument is simply a fixed-length array of
<code>interface{}</code>, so we'll need no manual conversion.</p>
<p>Once we've reached the last row, the <code>Next</code> contract is to
return an <code>io.EOF</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Next</span><span class="p">(</span><span class="nx">dest</span><span class="w"> </span><span class="p">[]</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">rows</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">rows</span><span class="p">[</span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="p">]</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">idx</span><span class="p">,</span><span class="w"> </span><span class="nx">cell</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">columns</span><span class="p">[</span><span class="nx">idx</span><span class="p">].</span><span class="nx">Type</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">IntType</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">i</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">TextType</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">s</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">:</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="o">++</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Finally in <code>Close</code> we'll set <code>index</code> higher than
the number of rows to force <code>Next</code> to only ever
return <code>io.EOF</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Close</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">rows</span><span class="p">))</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And that's all the changes needed to implement a
<code>database/sql</code> driver! See
<a href="https://github.com/eatonphil/gosql/commit/0d0aa61a74580a6aef11296741abfba4e1d4ae5c#diff-749da71b40f8ff06fc9e78ce917b0cce">here</a>
for <code>driver.go</code> in full.</p>
<h3 id="running-the-example">Running the example</h3><p>With the driver in place we can try out the example:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>./cmd/sqlexample/main.go
$<span class="w"> </span>./main
Name:<span class="w"> </span>Terry,<span class="w"> </span>Age:<span class="w"> </span><span class="m">45</span>
Name:<span class="w"> </span>Anette,<span class="w"> </span>Age:<span class="w"> </span><span class="m">57</span>
</pre></div>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Next post in the database basics series, implementing a database/sql driver for more seamless interactions in Go.<a href="https://t.co/AUZfUByNGE">https://t.co/AUZfUByNGE</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1259594720315047942?ref_src=twsrc%5Etfw">May 10, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/database-basics-a-database-sql-driver.htmlSun, 10 May 2020 00:00:00 +0000
- Writing a SQL database from scratch in Go: 3. indexeshttp://notes.eatonphil.com/database-basics-indexes.html<p class="note">
Previously in database basics:
<! forgive me, for I have sinned >
<br />
<a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a>
<br />
<a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a>
<br />
<br />
Next in database basics:
<br />
<a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a>
</p><p>In this post, we extend <a href="https://github.com/eatonphil/gosql">gosql</a>
to support indexes. We focus on the addition of <code>PRIMARY
KEY</code> constraints on table creation and some easy optimizations
during <code>SELECT</code> statements.</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="n">cmd</span><span class="o">/</span><span class="n">main</span><span class="p">.</span><span class="k">go</span>
<span class="n">Welcome</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">gosql</span><span class="p">.</span>
<span class="o">#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="err">\</span><span class="n">d</span><span class="w"> </span><span class="n">users</span>
<span class="k">Table</span><span class="w"> </span><span class="ss">"users"</span>
<span class="k">Column</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Type</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Nullable</span>
<span class="c1">---------+---------+-----------</span>
<span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">integer</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span>
<span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="o">|</span>
<span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">integer</span><span class="w"> </span><span class="o">|</span>
<span class="n">Indexes</span><span class="p">:</span>
<span class="w"> </span><span class="ss">"users_pkey"</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w"> </span><span class="n">rbtree</span><span class="w"> </span><span class="p">(</span><span class="ss">"id"</span><span class="p">)</span>
</pre></div>
<p>This post will broadly be a discussion of <a href="https://github.com/eatonphil/gosql/commit/9608511d9888ce3842ec7d1bfa8f77499e8123b2">this
commit</a>.</p>
<h3 id="what-is-an-index?">What is an index?</h3><p>An index is a mapping of a value to a row in a table. The value is
often a column, but it can be many kinds of expressions. Databases
typically store indexes in tree structures that provide O(log(n))
lookup time. When <code>SELECT</code>ing and filtering on a column
that is indexed, a database can greatly improve lookup time by
filtering first on this index. Without an index, a database must do a
linear scan for matching rows. Though sometimes if a condition is
broad enough, even with an index, a database may still end up doing a
linear scan.</p>
<p>While it may make sense initially to map a value to a row using a hash
table for constant lookup times, hash tables don't provide
ordering. So this would prevent an index from being applicable on
anything but equality checks. For example, <code>SELECT x FROM y WHERE
x > 2</code> couldn't use a hash index on <code>x</code>.</p>
<p>Indexes in many SQL databases default to a
<a href="https://www.cs.cornell.edu/courses/cs3110/2012sp/recitations/rec25-B-trees/rec25.html">B-Tree</a>,
which offers efficient ordering of elements. These indexes are thus
not constant-time lookups even if filtering on a unique column for a
single item. Some databases, <a href="https://www.postgresql.org/docs/current/indexes-types.html">like
PostgreSQL</a>,
allow you to use a hash-based index instead of a tree. Here the
previously listed restrictions apply (i.e. only equality checks will
use the index).</p>
<h3 id="upgrading-gosql">Upgrading gosql</h3><p>We proceed as follows:</p>
<ul>
<li>Upgrade table creation to support specifying a primary key<ul>
<li>Pick a tree data structure for the index, adding it to the table</li>
</ul>
</li>
<li>Upgrade <code>INSERT</code>s to let any indexes on the table process the new row</li>
<li>Upgrade <code>SELECT</code>s to make use of any indexes, if possible</li>
</ul>
<h3 id="upgrading-table-creation">Upgrading table creation</h3><p>To allow the specification of a single column as the primary key when
creating a table, we have to first modify the lexer and parser.</p>
<h4 id="lexing/parsing">Lexing/parsing</h4><p>Since we've covered this process a few times already suffice it so say
we make the following key additions:</p>
<ul>
<li><a href="https://github.com/eatonphil/gosql/blob/9608511d9888ce3842ec7d1bfa8f77499e8123b2/lexer.go#L36">Add <code>PRIMARY KEY</code> as a new keyword token to the lexer</a></li>
<li><a href="https://github.com/eatonphil/gosql/blob/9608511d9888ce3842ec7d1bfa8f77499e8123b2/parser.go#L425">Add a check for this token to the parsing of column definitions</a></li>
<li><a href="https://github.com/eatonphil/gosql/blob/9608511d9888ce3842ec7d1bfa8f77499e8123b2/ast.go#L98">Modify the AST to store a boolean value whether a column is a primary key</a></li>
</ul>
<h4 id="in-memory-backend">In-memory backend</h4><p>Next we move on to handling a primary key during table creation.</p>
<p>Since there are many existing papers and blogs on implementing tree
data structures, we will import an open-source implementation. And
while most databases use a B-Tree, the most important properties of
the tree for our purposes are 1) efficient ordering and 2) optionally
duplicate keys. We go with a Red-Black Tree,
<a href="https://github.com/petar/GoLLRB">GoLLRB</a>.</p>
<p>The full definition of an index now includes:</p>
<ul>
<li>A name</li>
<li>An expression (at first we only support this being an identifier referring to a
column)</li>
<li>A unique flag</li>
<li>A type name (it will just be <code>rbtree</code> for now)</li>
<li>A primary key flag (so we know to apply null checks among other things)</li>
<li>And the actual tree itself</li>
</ul>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">unique</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">tree</span><span class="w"> </span><span class="o">*</span><span class="nx">llrb</span><span class="p">.</span><span class="nx">LLRB</span>
<span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="kt">string</span>
<span class="p">}</span>
</pre></div>
<p>When we create a table, we add an index if one of the columns is a
primary key. We call out to a new public
method, <code>CreateIndex</code>, that will handle actually setting
things up.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">crt</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableAlreadyExists</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span>
<span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">datatype</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"int"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">IntType</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"text"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"boolean"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">BoolType</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nb">delete</span><span class="p">(</span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrInvalidDatatype</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">primaryKey</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">delete</span><span class="p">(</span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrPrimaryKeyAlreadyExists</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">expression</span><span class="p">{</span>
<span class="w"> </span><span class="nx">literal</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">literalKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">dt</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">CreateIndex</span><span class="p">(</span><span class="o">&</span><span class="nx">CreateIndexStatement</span><span class="p">{</span>
<span class="w"> </span><span class="nx">table</span><span class="p">:</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span>
<span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">"_pkey"</span><span class="p">},</span>
<span class="w"> </span><span class="nx">unique</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nx">primaryKey</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">primaryKey</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">delete</span><span class="p">(</span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Implementing <code>CreateIndex</code> is just a matter of adding a new
index to the table.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateIndex</span><span class="p">(</span><span class="nx">ci</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateIndexStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">ci</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrIndexAlreadyExists</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">index</span><span class="p">{</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">exp</span><span class="p">,</span>
<span class="w"> </span><span class="nx">unique</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">unique</span><span class="p">,</span>
<span class="w"> </span><span class="nx">primaryKey</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">primaryKey</span><span class="p">,</span>
<span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span>
<span class="w"> </span><span class="nx">tree</span><span class="p">:</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">New</span><span class="p">(),</span>
<span class="w"> </span><span class="nx">typ</span><span class="p">:</span><span class="w"> </span><span class="s">"rbtree"</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">table</span><span class="p">.</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for creation of tables and indexes! Table creation is
also the last time we need to make changes to the gosql
frontend. The rest of the changes simply wrap existing insertion and
selection.</p>
<h3 id="upgrading-insert">Upgrading INSERT</h3><p>When a row is inserted into a table, each index on that table needs to
process the row so it can add value-to-row mappings to the index.</p>
<p class="note">
In the project code, you'll notice logic in <code>CreateIndex</code>
to also go back over all existing rows to add them to the new index.
This post omits further discussing the case where an index is
created after a table is created. After reading this post, that case
should be easy to follow.
</p><p>Adding a row to an index is a matter of evaluting the index expression
against that row and storing the resulting value in the tree. Along
with the value, we store the integer index of the row in the
table.</p>
<p>If the index is required to be unique, we first check that the value
does not yet exist.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">addRow</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexValue</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">indexValue</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrViolatesNotNullConstraint</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">unique</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">Has</span><span class="p">(</span><span class="nx">treeItem</span><span class="p">{</span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">indexValue</span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrViolatesUniqueConstraint</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">InsertNoReplace</span><span class="p">(</span><span class="nx">treeItem</span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">indexValue</span><span class="p">,</span>
<span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="nx">rowIndex</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for insertion!</p>
<h3 id="upgrading-select">Upgrading SELECT</h3><p>Until now, the logic for selecting rows from a table is to pick the
table and iterate over all rows. If the row does not match
the <code>WHERE</code> filter, we pass the row.</p>
<p>If the table has an index and we are using the index in a recognized
pattern in the <code>WHERE</code> AST (more on that later), we can
pre-filter the table based on the index before iterating over each
row. We can do this for each index and for each time a recognized
pattern shows up.</p>
<p class="note">
This process is called query planning. We build a simplified
version of what you may see in SQL databases, specifically focusing
on index usage since we don't yet support <code>JOIN</code>s. For
further reading, SQLite has
an <a href="https://www.sqlite.org/queryplanner.html#_lookup_by_index">excellent
document</a> on their query planner for index usage.
</p><div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Results</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">ResultColumn</span><span class="p">{}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">memoryCell</span><span class="p">{{}}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">getApplicableIndexes</span><span class="p">(</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">i</span>
<span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">e</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">newTableFromSubset</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="o">*</span><span class="nx">val</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">finalItems</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">col</span><span class="p">.</span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">ResultColumn</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Results</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>It's very simple and easy to miss, here is the change called out:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">getApplicableIndexes</span><span class="p">(</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">i</span>
<span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">e</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">newTableFromSubset</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="getapplicableindexes">getApplicableIndexes</h4><p>There are probably a few very simple patterns we could look for, but
for now we look for boolean expressions joined by <code>AND</code>
that contain an index expression.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">getApplicableIndexes</span><span class="p">(</span><span class="nx">where</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">indexAndExpression</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">where</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">linearizeExpressions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">where</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">where</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">where</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">binaryKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exps</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">orKeyword</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exps</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">andKeyword</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="p">(</span><span class="o">&</span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="p">(</span><span class="o">&</span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">where</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="p">(</span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="p">{})</span>
<span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">indexAndExpression</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">applicableValue</span><span class="p">(</span><span class="nx">exp</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">iAndE</span><span class="p">,</span><span class="w"> </span><span class="nx">indexAndExpression</span><span class="p">{</span>
<span class="w"> </span><span class="nx">i</span><span class="p">:</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span>
<span class="w"> </span><span class="nx">e</span><span class="p">:</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">iAndE</span>
<span class="p">}</span>
</pre></div>
<p>More specifically though, within binary operations we only support
matching on an index if the following three conditions are met:</p>
<ul>
<li>the operator is one of <code>=</code>,
<code><></code>, <code>></code>, <code><</code>, <code>>=</code>, or
<code><=</code></li>
<li>one of the operands is an identifier literal that matches the index's <code>exp</code> value</li>
<li>the other operand is a literal value</li>
</ul>
<p class="note">
This is a simpler, stricter matching of an index than PostgreSQL
where you can index expressions more generally, not just identifer
literals.
</p><div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">applicableValue</span><span class="p">(</span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">binaryKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">binary</span>
<span class="w"> </span><span class="c1">// Find the column and the value in the boolean expression</span>
<span class="w"> </span><span class="nx">columnExp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">a</span>
<span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">b</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">columnExp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">exp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columnExp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">b</span>
<span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">a</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Neither side is applicable, return nil</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">columnExp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">exp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">supportedChecks</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">symbol</span><span class="p">{</span><span class="nx">eqSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">gtSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">gteSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">ltSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">lteSymbol</span><span class="p">}</span>
<span class="w"> </span><span class="nx">supported</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">sym</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">supportedChecks</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">sym</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">supported</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">supported</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">valueExp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Only index checks on literals supported"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">valueExp</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for finding applicable indexes.</p>
<h4 id="newtablefromsubset">newTableFromSubset</h4><p>The last remaining piece is to go from a boolean expression in
a <code>WHERE</code> clause (where an index is applicable) to a subset
of rows in a table.</p>
<p>Since we are only working with patterns of the type
<code>indexed-column OP literal-value</code>, we grab the literal
using the previous <code>applicableValue</code> helper. Then we
look up that literal value in the index and return a new table with
every row in the index that meets the condition of the operator for the
literal value.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">newTableFromSubset</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">applicableValue</span><span class="p">(</span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">().</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">valueExp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">tiValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">treeItem</span><span class="p">{</span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">value</span><span class="p">}</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint</span><span class="p">{}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">exp</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">eqSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Equal</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Inf</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Equal</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ltSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">DescendLessOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">lteSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">DescendLessOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gtSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gteSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">newT</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span>
<span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span>
<span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span>
<span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">indexes</span>
<span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">memoryCell</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">newT</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">[</span><span class="nx">index</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">newT</span>
<span class="p">}</span>
</pre></div>
<p>As you can see, an index may not necessarily improve on a linear
search in some conditions. Imagine a table of 1 million rows indexed
on an autoincrementing column. Imagine filtering on <code>col >
10</code>. The index may be able to eliminate 10 items but still
return a pre-filtered table of around 1 million rows that must
be passed through the <code>WHERE</code> filter.</p>
<p>Additionally since we process each boolean expression one at a time,
we can't take advantage of knowledge that might seem obvious to a
human for two boolean expressions that together bound a range. For
example in <code>x > 10 AND x < 20</code> we can see that only
integers from 11 to 19 are applicable. But the current logic would go
through each expression separately and find all rows that match either
before the final linear search through all pre-filtered rows would
eliminate the bulk.</p>
<p class="note">
Thankfully real databases have decades of optimizations. But even
then it can be difficult to know what index usages are being
optimized without reading documentation, benchmarking, using
<code>EXPLAIN ANALYSE</code>, or reading the source.
</p><p>But that's it for changes needed to support basic indexes end-to-end!</p>
<h3 id="trialing-an-index">Trialing an index</h3><p>Since the addition of indexes is so seamless, it is difficult to tell
without trial that the index is effective. So we write a simple
program that inserts N rows with and without an index. Finally it will
query for the first and last items inserted. We show time and
memory used during both insertion and selection.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"runtime"</span>
<span class="w"> </span><span class="s">"strconv"</span>
<span class="w"> </span><span class="s">"time"</span>
<span class="w"> </span><span class="s">"github.com/eatonphil/gosql"</span>
<span class="p">)</span>
<span class="kd">var</span><span class="w"> </span><span class="nx">inserts</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="kd">var</span><span class="w"> </span><span class="nx">lastId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="kd">var</span><span class="w"> </span><span class="nx">firstId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">doInsert</span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parser</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">inserts</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">lastId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">firstId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lastId</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"INSERT INTO users VALUES (%d)"</span><span class="p">,</span><span class="w"> </span><span class="nx">lastId</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">InsertStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">doSelect</span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parser</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"SELECT id FROM users WHERE id = %d"</span><span class="p">,</span><span class="w"> </span><span class="nx">lastId</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">SelectStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Expected 1 row"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">())</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">inserts</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"Bad row, got: %d"</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">()))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"SELECT id FROM users WHERE id = %d"</span><span class="p">,</span><span class="w"> </span><span class="nx">firstId</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">SelectStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">"Expected 1 row"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">())</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"Bad row, got: %d"</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">()))</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">perf</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">,</span><span class="w"> </span><span class="nx">cb</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Starting"</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span>
<span class="w"> </span><span class="nx">cb</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Finished %s: %f seconds\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Since</span><span class="p">(</span><span class="nx">start</span><span class="p">).</span><span class="nx">Seconds</span><span class="p">())</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">MemStats</span>
<span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">ReadMemStats</span><span class="p">(</span><span class="o">&</span><span class="nx">m</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Alloc = %d MiB\n\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">m</span><span class="p">.</span><span class="nx">Alloc</span><span class="o">/</span><span class="mi">1024</span><span class="o">/</span><span class="mi">1024</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">mb</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">NewMemoryBackend</span><span class="p">()</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--with-index"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"--inserts"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">inserts</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">" PRIMARY KEY"</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parser</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"CREATE TABLE users (id INT%s)"</span><span class="p">,</span><span class="w"> </span><span class="nx">primaryKey</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">CreateTableStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">indexingString</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">" with indexing enabled"</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">index</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">indexingString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Inserting %d rows%s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">inserts</span><span class="p">,</span><span class="w"> </span><span class="nx">indexingString</span><span class="p">)</span>
<span class="w"> </span><span class="nx">perf</span><span class="p">(</span><span class="s">"INSERT"</span><span class="p">,</span><span class="w"> </span><span class="nx">mb</span><span class="p">,</span><span class="w"> </span><span class="nx">doInsert</span><span class="p">)</span>
<span class="w"> </span><span class="nx">perf</span><span class="p">(</span><span class="s">"SELECT"</span><span class="p">,</span><span class="w"> </span><span class="nx">mb</span><span class="p">,</span><span class="w"> </span><span class="nx">doSelect</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Build and run once without an index:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>cmd/indextest/main.go
./main<span class="w"> </span>--inserts<span class="w"> </span><span class="m">1000000</span>
Inserting<span class="w"> </span><span class="m">1000000</span><span class="w"> </span>rows
Starting<span class="w"> </span>INSERT
Finished<span class="w"> </span>INSERT:<span class="w"> </span><span class="m">76</span>.175133<span class="w"> </span>seconds
<span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">239</span><span class="w"> </span>MiB
Starting<span class="w"> </span>SELECT
Finished<span class="w"> </span>SELECT:<span class="w"> </span><span class="m">1</span>.301556<span class="w"> </span>seconds
<span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">238</span><span class="w"> </span>MiB
</pre></div>
<p>And run again with an index:</p>
<div class="highlight"><pre><span></span>./main<span class="w"> </span>--inserts<span class="w"> </span><span class="m">1000000</span><span class="w"> </span>--with-index
Inserting<span class="w"> </span><span class="m">1000000</span><span class="w"> </span>rows<span class="w"> </span>with<span class="w"> </span>indexing<span class="w"> </span>enabled
Starting<span class="w"> </span>INSERT
Finished<span class="w"> </span>INSERT:<span class="w"> </span><span class="m">89</span>.108121<span class="w"> </span>seconds
<span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">341</span><span class="w"> </span>MiB
Starting<span class="w"> </span>SELECT
Finished<span class="w"> </span>SELECT:<span class="w"> </span><span class="m">0</span>.000137<span class="w"> </span>seconds
<span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">341</span><span class="w"> </span>MiB
</pre></div>
<p>The basic tradeoff that you can see is that for more memory and longer
insertion times, you get a significantly faster lookup.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Very excited to share the latest database basics post on implementing indexes in gosql.<a href="https://t.co/QHfjCe1XsC">https://t.co/QHfjCe1XsC</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1256209468133650433?ref_src=twsrc%5Etfw">May 1, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/database-basics-indexes.htmlFri, 01 May 2020 00:00:00 +0000
- Writing a SQL database from scratch in Go: 2. binary expressions and WHERE filtershttp://notes.eatonphil.com/database-basics-expressions-and-where.html<p class="note">
Previously in database basics:
<! forgive me, for I have sinned >
<br />
<a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a>
<br />
<br />
Next in database basics:
<br />
<a href="/database-basics-indexes.html">3. indexes</a>
<br />
<a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a>
</p><p>In this post, we'll extend <a href="https://github.com/eatonphil/gosql">gosql</a>
to support binary expressions and very simple filtering on SELECT
results via WHERE. We'll introduce a general mechanism for
interpreting an expression on a row in a table. The expression may be
an identifier (where the result is the value of the cell corresponding
to that column in the row), a numeric literal, a combination via a
binary expression, etc.</p>
<p>The following interactions will be possible:</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'Stephen'</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">age</span>
<span class="c1">----------+------</span>
<span class="n">Stephen</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">result</span><span class="p">)</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'Adrienne'</span><span class="p">,</span><span class="w"> </span><span class="mi">23</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">23</span><span class="p">;</span>
<span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span>
<span class="c1">------+-----------</span>
<span class="mi">25</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Adrienne</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">result</span><span class="p">)</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="n">name</span>
<span class="c1">------------</span>
<span class="n">Stephen</span>
<span class="n">Adrienne</span>
<span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="n">results</span><span class="p">)</span>
<span class="n">ok</span>
</pre></div>
<p>The changes we'll make in this post are roughly a walk through of
<a href="https://github.com/eatonphil/gosql/commit/bd6a5d0d4a7410699b0d01beaabf91923df34b28">this
commit</a>.</p>
<h3 id="boilerplate-updates">Boilerplate updates</h3><p>There are a few updates to pick up that I won't go into in this
post. Grab the following files from the main repo:</p>
<ul>
<li><a href="https://github.com/eatonphil/gosql/blob/master/lexer.go">lexer.go</a><ul>
<li>The big change here is to use the same keyword matching algorithm
for symbols. This allows us to support symbols that are longer
than one character.</li>
<li>This file also now includes the following keywords and symbols:
<code>and</code>, <code>or</code>, <code>true</code>,
<code>false</code>, <code>=</code>, <code><></code>,
<code>||</code>, and <code>+</code>.</li>
</ul>
</li>
<li><a href="https://github.com/eatonphil/gosql/blob/master/cmd/main.go">cmd/main.go</a><ul>
<li>This file now uses a <a href="https://github.com/olekukonko/tablewriter">third-party table-rendering
library</a> instead of the
hacky, handwritten original one.</li>
<li>This also uses a <a href="https://github.com/chzyer/readline">third-party readline
implementation</a> so you get
history and useful cursor movement in the REPL.</li>
</ul>
</li>
</ul>
<h4 id="parsing-boilerplate">Parsing boilerplate</h4><p>We'll redefine three helper functions in <code>parser.go</code> before
going further:
<code>parseToken</code>, <code>parseTokenKind</code>, and
<code>helpMessage</code>.</p>
<p>The <code>parseToken</code> helper will consume a token if it matches
the one provided as an argument (ignoring location).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">];</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">p</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<p>The <code>parseTokenKind</code> helper will consume a token if it is
the same kind as an argument provided.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseTokenKind</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">tokenKind</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">current</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">current</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<p>And the <code>helpMessage</code> helper will give an indication of
where in a program something happened.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="o">+</span><span class="mi">1</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"[%d,%d]: %s, near: %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h3 id="parsing-binary-expressions">Parsing binary expressions</h3><p>Next we'll extend the AST structure in <code>ast.go</code> to
support a "binary kind" of expression. The binary expression will have
two sub-expressions and an operator.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="nx">expressionKind</span>
<span class="w"> </span><span class="nx">binaryKind</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">binaryExpression</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="nx">token</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">expression</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">literal</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span>
<span class="w"> </span><span class="nx">binary</span><span class="w"> </span><span class="o">*</span><span class="nx">binaryExpression</span>
<span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">expressionKind</span>
<span class="p">}</span>
</pre></div>
<p>We'll use Pratt parsing to handle operator precedence. There is an
excellent introduction to this technique
<a href="https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html">here</a>.</p>
<p>If at the beginning of parsing we see a left parenthesis, we'll
consume it and parse an expression within it. Then we'll look for a
right parenthesis. Otherwise we'll look for a non-binary expression
first (e.g. symbol, number).</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftParenSymbol</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">rightParenToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightParenSymbol</span><span class="p">)</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">),</span><span class="w"> </span><span class="nx">minBp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected expression after opening paren"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected closing paren"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseLiteralExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>Then we'll look for a binary operator (e.g. <code>=</code>,
<code>and</code>) or delimiter. If we find an operator and it of
lesser "binding power" than the current minimum (<code>minBp</code>
passed as an argument to the parse function with a default value of
<code>0</code>), we'll return the current expression.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cursor</span>
<span class="nx">outer</span><span class="p">:</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">outer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">andKeyword</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">orKeyword</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">eqSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">neqSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">concatSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">plusSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected binary operator"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">op</span><span class="p">.</span><span class="nx">bindingPower</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lastCursor</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
</pre></div>
<p>The <code>bindingPower</code> function on tokens can be defined for
now such that sum and concatenation have the highest binding power,
followed by equality operations, then boolean operators, and then
everything else at zero.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">bindingPower</span><span class="p">()</span><span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">:</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">keyword</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">andKeyword</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">orKeyword</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">:</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">eqSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">concatSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">plusSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">3</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span>
<span class="p">}</span>
</pre></div>
<p>Back in <code>parseExpression</code>, if the new operator has greater
binding power we'll parse the next operand expression (a recursive
call, passing the binding power of the new operator as the new
<code>minBp</code>).</p>
<p>Upon completion, the current expression (the return value of the
recursive call) is set to a new binary expression containing the
previously current expression on the left and the just-parsed
expression on the right.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">bp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected right operand"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">expression</span><span class="p">{</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">binaryExpression</span><span class="p">{</span>
<span class="w"> </span><span class="o">*</span><span class="nx">exp</span><span class="p">,</span>
<span class="w"> </span><span class="o">*</span><span class="nx">b</span><span class="p">,</span>
<span class="w"> </span><span class="o">*</span><span class="nx">op</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">binaryKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>All together:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftParenSymbol</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">rightParenToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightParenSymbol</span><span class="p">)</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">),</span><span class="w"> </span><span class="nx">minBp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected expression after opening paren"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected closing paren"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseLiteralExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cursor</span>
<span class="nx">outer</span><span class="p">:</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">outer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">andKeyword</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">orKeyword</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">eqSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">neqSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">concatSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">plusSymbol</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected binary operator"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">op</span><span class="p">.</span><span class="nx">bindingPower</span><span class="p">()</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lastCursor</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">bp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected right operand"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">expression</span><span class="p">{</span>
<span class="w"> </span><span class="nx">binary</span><span class="p">:</span><span class="w"> </span><span class="o">&</span><span class="nx">binaryExpression</span><span class="p">{</span>
<span class="w"> </span><span class="o">*</span><span class="nx">exp</span><span class="p">,</span>
<span class="w"> </span><span class="o">*</span><span class="nx">b</span><span class="p">,</span>
<span class="w"> </span><span class="o">*</span><span class="nx">op</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">binaryKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>Now that we have this general parse expression helper in place, we can
add support for parsing <code>WHERE</code> in <code>SELECT</code>
statements.</p>
<h3 id="parsing-where">Parsing WHERE</h3><p>This part's pretty easy. We modify the existing
<code>parseSelectStatement</code> to search for an optional
<code>WHERE</code> token followed by an expression.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">selectKeyword</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">slct</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">fromToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">fromKeyword</span><span class="p">)</span>
<span class="w"> </span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseSelectItem</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">fromToken</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">item</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">whereToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">whereKeyword</span><span class="p">)</span>
<span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">,</span><span class="w"> </span><span class="nx">whereToken</span><span class="p">}</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">fromToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseFromItem</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected FROM item"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">from</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">whereToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected WHERE conditionals"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">where</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>Now we're all done with parsing binary expressions and
<code>WHERE</code> filters! If in doubt, refer to
<a href="https://github.com/eatonphil/gosql/blob/master/parser.go">parser.go</a>
in the project.</p>
<h3 id="re-thinking-query-execution">Re-thinking query execution</h3><p>In the first post in this series, we didn't establish any standard way
for interpreting an expression in any kind of statement. In SQL
though, every expression is always run in the context of a row in a
table. We'll handle cases like <code>SELECT 1</code> and <code>INSERT INTO
users VALUES (1)</code> by creating a table with a single empty row to act
as the context.</p>
<p>This requires a bit of re-architecting. So we'll rewrite the
<code>memory.go</code> implementation in this post from scratch.</p>
<p>We'll also stop <code>panic</code>-ing when things go wrong. Instead
we'll print a message. This allows the REPL loop to keep going.</p>
<h4 id="memory-cells">Memory cells</h4><p>Again the fundamental blocks of memory in the table will be an untyped
array of bytes. We'll provide conversion methods from this memory cell
into integers, strings, and boolean Go values.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="kt">int32</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewBuffer</span><span class="p">(</span><span class="nx">mc</span><span class="p">),</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Corrupted data [%s]: %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">mc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">mc</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">mc</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">equals</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Seems verbose but need to make sure if one is nil, the</span>
<span class="w"> </span><span class="c1">// comparison still fails quickly</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">mc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">mc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">mc</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span>
<span class="p">}</span>
</pre></div>
<p>We'll also extend the <code>Cell</code> interface in
<code>backend.go</code> to support the new boolean type.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">gosql</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="kt">uint</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">IntType</span>
<span class="w"> </span><span class="nx">BoolType</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Cell</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span>
<span class="w"> </span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="kt">bool</span>
<span class="p">}</span>
<span class="o">...</span>
</pre></div>
<p>Finally, we need a way for mapping a Go value <em>into</em> a memory
cell.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">numericKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">new</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span><span class="p">)</span>
<span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Corrupted data [%s]: %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// TODO: handle bigint</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">int32</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"Corrupted data [%s]: %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">buf</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()),</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">buf</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">boolKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">"true"</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">([]</span><span class="kt">byte</span><span class="p">{</span><span class="mi">1</span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>And we'll provide global <code>true</code> and <code>false</code>
values:</p>
<div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">trueToken</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">boolKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="s">"true"</span><span class="p">}</span>
<span class="w"> </span><span class="nx">falseToken</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">boolKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="s">"false"</span><span class="p">}</span>
<span class="w"> </span><span class="nx">trueMemoryCell</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&</span><span class="nx">trueToken</span><span class="p">)</span>
<span class="w"> </span><span class="nx">falseMemoryCell</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&</span><span class="nx">falseToken</span><span class="p">)</span>
<span class="p">)</span>
</pre></div>
<h4 id="tables">Tables</h4><p>A table has a list of rows (an array of memory cells) and a list of
column names and types.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">[]</span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">MemoryCell</span>
<span class="p">}</span>
</pre></div>
<p>Finally we'll add a series of methods on <code>table</code> that,
given a row index, interprets an expression AST against that row in
the table.</p>
<h3 id="interpreting-literals">Interpreting literals</h3><p>First we'll implement <code>evaluateLiteralCell</code> that will look
up an identifier or return the value of integers, strings, and
booleans.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">evaluateLiteralCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">MemoryCell</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ColumnType</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">lit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">literal</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">identifierKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">[</span><span class="nx">rowIndex</span><span class="p">][</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">tableCol</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">IntType</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">boolKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">BoolType</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="nx">lit</span><span class="p">),</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h3 id="interpreting-binary-expressions">Interpreting binary expressions</h3><p>Now we can implement <code>evaluateBinaryCell</code> that will
evaluate it's two sub-expressions and combine them together according
to the operator. The SQL operators we have defined so far do no
coercion. So we'll fail immediately if the two sides of the operation
are not of the same type. Additionally, the concatenation and addition
operators require that their arguments are strings and numbers,
respectively.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">evaluateBinaryCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">MemoryCell</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ColumnType</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">binaryKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">bexp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">binary</span>
<span class="w"> </span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">lt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">bexp</span><span class="p">.</span><span class="nx">a</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">rt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">bexp</span><span class="p">.</span><span class="nx">b</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">bexp</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">:</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">bexp</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">eqSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">falseMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">!</span><span class="nx">l</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">falseMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">concatSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&</span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">stringKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()}),</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">TextType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">plusSymbol</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">iValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&</span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">numericKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Itoa</span><span class="p">(</span><span class="nx">iValue</span><span class="p">)}),</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">IntType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">:</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">keyword</span><span class="p">(</span><span class="nx">bexp</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">andKeyword</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">falseMemoryCell</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">trueMemoryCell</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">orKeyword</span><span class="p">:</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">falseMemoryCell</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">trueMemoryCell</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">"?column?"</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span>
<span class="p">}</span>
</pre></div>
<p>Then we'll provide a generic <code>evaluateCell</code> method to wrap
these two correctly:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">MemoryCell</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ColumnType</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">literalKind</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateLiteralCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">binaryKind</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateBinaryCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="implementing-select">Implementing SELECT</h3><p>As before, each statement will operate on a backend of tables.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tables</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">NewMemoryBackend</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">MemoryBackend</span><span class="p">{</span>
<span class="w"> </span><span class="nx">tables</span><span class="p">:</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span><span class="p">{},</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>When we implement <code>SELECT</code>, we'll iterate over each row in
the table (we only support looking up one table for now). If the
<code>SELECT</code> statement contains a <code>WHERE</code> block,
we'll evaluate the <code>WHERE</code> expression against the current
row and move on if the result is <code>false</code>.</p>
<p>Otherwise for each expression in the <code>SELECT</code> list of items
we'll evaluate it against the current row in the table.</p>
<p>If there is no table selected, we provide a fake table with a single
empty row.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">table</span><span class="p">{}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">table</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Results</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="p">}{}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">table</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">MemoryCell</span><span class="p">{{}}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">val</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">asterisk</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: handle asterisk</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Skipping asterisk."</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">col</span><span class="p">.</span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="p">}{</span>
<span class="w"> </span><span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Results</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h3 id="implementing-insert,-create">Implementing INSERT, CREATE</h3><p>The <code>INSERT</code> and <code>CREATE</code> statements stay mostly
the same except for that we'll use the <code>evaluateCell</code> help
for every expression. Refer back to the first post if the
implementation is otherwise unclear.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Insert</span><span class="p">(</span><span class="nx">inst</span><span class="w"> </span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">inst</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">MemoryCell</span><span class="p">{}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrMissingValues</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Skipping non-literal."</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">emptyTable</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&</span><span class="nx">table</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">emptyTable</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">crt</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">table</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">t</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">datatype</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"int"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">IntType</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"text"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrInvalidDatatype</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">dt</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h3 id="back-to-the-repl">Back to the REPL</h3><p>Putting it all together, we run the following session:</p>
<div class="highlight"><pre><span></span><span class="err">#</span><span class="w"> </span><span class="nx">CREATE</span><span class="w"> </span><span class="nx">TABLE</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="nx">TEXT</span><span class="p">,</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="nx">INT</span><span class="p">);</span>
<span class="nx">ok</span>
<span class="err">#</span><span class="w"> </span><span class="nx">INSERT</span><span class="w"> </span><span class="nx">INTO</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="nx">VALUES</span><span class="w"> </span><span class="p">(</span><span class="err">'</span><span class="nx">Stephen</span><span class="err">'</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">);</span>
<span class="nx">ok</span>
<span class="err">#</span><span class="w"> </span><span class="nx">SELECT</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="nx">FROM</span><span class="w"> </span><span class="nx">users</span><span class="p">;</span>
<span class="nx">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">age</span>
<span class="o">----------+------</span>
<span class="nx">Stephen</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span>
<span class="nx">ok</span>
<span class="err">#</span><span class="w"> </span><span class="nx">INSERT</span><span class="w"> </span><span class="nx">INTO</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="nx">VALUES</span><span class="w"> </span><span class="p">(</span><span class="err">'</span><span class="nx">Adrienne</span><span class="err">'</span><span class="p">,</span><span class="w"> </span><span class="mi">23</span><span class="p">);</span>
<span class="nx">ok</span>
<span class="err">#</span><span class="w"> </span><span class="nx">SELECT</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">FROM</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="nx">WHERE</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">23</span><span class="p">;</span>
<span class="nx">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">name</span>
<span class="o">------+-----------</span>
<span class="mi">25</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">Adrienne</span>
<span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span>
<span class="nx">ok</span>
<span class="err">#</span><span class="w"> </span><span class="nx">SELECT</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">FROM</span><span class="w"> </span><span class="nx">users</span><span class="p">;</span>
<span class="nx">name</span>
<span class="o">------------</span>
<span class="nx">Stephen</span>
<span class="nx">Adrienne</span>
<span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="nx">results</span><span class="p">)</span>
<span class="nx">ok</span>
</pre></div>
<p>And that's it for now! In future posts we'll get into indices, joining
tables, etc.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post up in the database basics series: adding support for binary expressions and WHERE filtering in SELECTs.<br><br>Much nicer to have a real table rendering library and readline implementation in the REPL too.<a href="https://t.co/GYzn3FUNon">https://t.co/GYzn3FUNon</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1249426633347473408?ref_src=twsrc%5Etfw">April 12, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/database-basics-expressions-and-where.htmlSun, 12 Apr 2020 00:00:00 +0000
- Studying foreign languages with inbox zerohttp://notes.eatonphil.com/studying-with-inbox-zero.html<p>The only time I've been able to seriously, rapidly improve my ability
to speak a foreign language was through intensive language courses in
college. I was forced to actively speak, read, and write Chinese for
6-8 hours a week (1-2 hours every day). Then study another 5-10 hours
a week in preparation for the active sessions. I went three semesters
like this before I left school.</p>
<p>I've been trying to recreate that intensity since and mostly
failed. After marrying a Korean, I've redirected the little effort I
can muster to learning Korean. Aside from stints over the years
(mostly for a month or two before or after a trip to Korea), I haven't
been able to keep up any practice.</p>
<p>One thing I've tried over the years to commit myself to learning a
number of different topics is to set up recurring calendar invites:
"Study Linux", "Study TCP/IP", "Study Korean", etc.</p>
<p>This has mostly failed too. However, I do always <em>look</em> at the
invites as I get notified.</p>
<p>I keep inbox zero and I check my email many times a day, marking each
email read dilligently when I no longer need to think about it.</p>
<p>Tools like Quizlet, Anki, or even Duolingo let you self-learn
vocabulary <em>when you feel like it</em>. But basically no service
will try to keep giving you exposure to some set of topics whether you
spend time on it or not.</p>
<p>The most important thing I can think of is forced exposure to
vocabulary. So I've been planning for some time to hook up a list of
the one thousand most common Korean words to scheduled emails.</p>
<p>This weekend I finally got around to scripting the Google Calendar API
against the words list. I have an event for each word for the next
1000 days. Each day I receive a summary email including all events of
the day and the new word is part of it.</p>
<p>This is a pretty indirect approach but it's pretty simple to set
up. It's not very easy to reconfigure.</p>
<p>The code for doing this is <a href="https://github.com/eatonphil/learnit">available on
Github</a> if you're
interested. And if you know a service that can build and manage
scheduled notifications against a spreadsheet or database I'd rather
be looking at that.</p>
<p>We'll see how this works out.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Daily new words in my inbox feels like the only way I can "force" myself to get exposed to new vocabulary. Wish there were a service for scheduling notifications from a spreadsheet. Finally got to scripting GCal's API populating daily events from 1000 most common Korean words</p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1246557948068925441?ref_src=twsrc%5Etfw">April 4, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/studying-with-inbox-zero.htmlSat, 04 Apr 2020 00:00:00 +0000
- Reviewing the Surface Book 2http://notes.eatonphil.com/reviewing-the-surface-book-2.html<p>The first few paragraphs cover what I was looking for and what I
considered. Then the review.</p>
<h3 id="why-the-surface-book-2">Why the Surface Book 2</h3><p>I used a Macbook throughout my professional career until I had the
choice a few years ago when I started my current job. Here, I ran
Gentoo, then FreeBSD, then Arch, and now Windows 10 on the Dell XPS
15.</p>
<p>I enjoy Windows and I think Microsoft is doing a better job on
hardware and software these days. At least, compared to Apple, they
appear to be trying. So when my personal 2015 Macbook Pro died this
year I decided to buy and run Windows at home.</p>
<p>On my Mac, I dealt with bad battery life for a while: running VMs,
running Docker, compiling Go, running Node.js kills any battery. So I
moved my development into the cloud and gained on battery life and
network speeds at the cost of memory (I am paying for 4GB of RAM).</p>
<p>My ideal replacement was a cheaper machine that felt as good as a 2015
Macbook Pro. (The build quality has not been good since.) I was
hoping not to pay more than $1000. My shortlist included the Surface
Book 2, the Surface Pro X, the Surface Laptop 3, the Lenovo Yoga 14,
and the LG Gram. So I went to Best Buy to try them out.</p>
<p>I was impressed by every Surface device. At first sight, I mistook the
Surface Book and Surface Laptop for an old Macbook Pro. They both have
a brushed aluminum body with a large trackpad and great
keyboards. Even the Surface Pro X, which is a tablet, has an addon
keyboard that is easy to type (that is, program) on.</p>
<p>I tried out the Lenovo Yoga 14 and it was solid, but I preferred the
brushed aluminum body of the Surface devices. I did not get a chance
to feel out the LG Gram.</p>
<p>I eliminated the Surface Laptop 3 because I like tablet mode. While
the Surface Laptop 3 is a touchscreen, it is not a 2-in-1 device and
does not have tablet mode.</p>
<p>And I eliminated the Surface Pro X because it is one of the first
mainstream Windows ARM devices. While Windows on ARM is now the same
operating system as Windows on a desktop, most consumer software ships
x86_64 (not ARM) binaries. Windows on ARM can emulate x86 but not yet
x86_64. I didn't feel like working around this on my primary personal
device.</p>
<p>I bought the 13.5", 7th generation i5 Surface Book 2 for $999. It
comes with 8GB DDR4 RAM and a 128GB SSD. I have had the device for two
weeks now and I use it at least 10 hours a day.</p>
<h3 id="keyboard">Keyboard</h3><p>The keyboard layout is standard, easy to use. The control, shift,
caps, function, and alt keys are big enough that it is easy to program
without staring at the keyboard. The up and down arrow keys are
smaller than would be nice. But they are easier for me to find than on
a 2019 Macbook Pro.</p>
<p>The function key is modal by default (like a Caps key) and indicates
if function is enabled with a small LED. I have never seen a function
key like this. I find it annoying when I turn it on.</p>
<p>And while there is builtin volume controls and a play/pause button,
there is no media forward/back button. I assigned
Ctrl+Windows+Alt+Left/Right to be media forward/back.</p>
<p>There is also no right Ctrl key. Instead there is a "media key" which is
the equivalent of right-clicking... I guess. This is useless so I
mapped it back to another Ctrl key.</p>
<p>Unlike macOS, which needs an app like Spectacle, Windows default
window control shortcuts are great. Windows+Left to send to the left
half, Windows+Right to send to the right half, Windows+Up to make full
screen.</p>
<p>But macOS default swipe gestures are more intuitive: swipe left to go
backwards, swipe right to go forwards. So I mapped this back myself.</p>
<p><a href="https://gist.github.com/eatonphil/0a684561d599fcd94128ff462a5253b7">Here is my autohotkey
script.</a></p>
<h3 id="screen">Screen</h3><p>The 13.5" screen feels top-heavy but may not actually weigh more than
the keyboard/body. The bevel is larger than it feels like it should
be. But the camera is in the right location: top and center.</p>
<p>Additionally, the default behavior when attaching/detaching the screen
is to prompt you to enter/exit tablet mode rather than doing it for
you. This prompt is easy to click out of and after doing so the option
to switch between disappears until you reattach and detach again.</p>
<p>The screen isn't flush with the body when you close it. Few marketing
pictures show you this, but here's
<a href="https://assets.pcmag.com/media/images/563021-microsoft-surface-book-2-15-inch.jpg?thumb=y">one</a>. This
makes me worry something may snap if the laptop is ever slammed
against a wall for some reason.</p>
<p>And fully open, it only goes back 120 degrees. This makes it hard to
look at if it is on your legs and your legs are up higher than 90
degrees.</p>
<p>Finally, the headphone jack is not on the body but on the screen. This
makes sense since the screen is detachable. But the jack is on the
top-right corner, further away than usual. This requires me to be
closer to the screen to feel like I am not pulling the screen when I
am wearing headphones.</p>
<h4 id="pen">Pen</h4><p>The Surface Pen is awesome and the screen's palm detection is too. I
have had a lot of fun drawing on it in Paint 3. And it has been useful
in annotating mockups for work too.</p>
<p>It costs $100 and comes with a AAAA battery. It is magnetized and
sticks to the left side of the screen.</p>
<h3 id="body">Body</h3><p>As mentioned, the body is a brushed aluminum. It feels great. The
power input is magnetic, which is helpful. But it uses a novel
Surface-specific input rather than USB-C, so that sucks. A new charger
from Microsoft costs $100.</p>
<p>The speakers are as good as Macbook speakers were 5 years ago.
They don't have much bass. Additionally, these speakers get a little
distorted at top volume.</p>
<p>The battery lasts 7-8 hours without charging. While this is as
advertised, it is still disappointing of a new laptop in 2020 that is
only running Chrome, Spotify, and Windows Terminal.</p>
<h4 id="tablet">Tablet</h4><p>To release the screen from the body, there is a key on the function
row. However, it is not a hardware release. So when I accidentally
killed the battery while the screen was flipped, I couldn't detach the
screen after booting (to turn it back into a laptop) until after 10-20
minutes of charging.</p>
<p>The screen isn't easy to detach. It requires both hands lifting up
from the base of the screen to get enough leverage. You cannot pull up
from the top of the screen.</p>
<p>Aside from drawing apps, tablet mode apps on Windows aren't
great. Kindle for Windows on tablet is terrible. I got stuck in
Kindle's full screen mode and couldn't adjust the page size or exit
full screen mode without reverting back to laptop mode first.</p>
<p>Tablet mode also throws away the standard Windows menu and shortcuts
to give you a desktop of application cards. However, these cards don't
adapt to recent or frequent applications. After I deleted Candy Crush
and other built in apps I will never use, this desktop is blank except
for Edge and Groove Music. It is incredible how bad the tablet desktop
is. You have to use the full application list view every time you want
to open a new program.</p>
<h3 id="in-summary">In summary</h3><p>It's not a bad Windows machine for $1000. The body is great quality
and the pen/screen interaction is solid. But I'd like to see Windows
invest more in a useful tablet experience. And the detachable screen
comes at the cost of being a awkward. So I'd go with the Surface Pro X
or Surface Laptop 3 next time.</p>
<p>But above all I can't shake the expectation that a laptop built in
2020 running GMail and Slack in Chrome, Spotify, and a terminal
application should last at least 10 hours.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post reviewing Microsoft's Surface Book 2<a href="https://t.co/0n6K3y6FBC">https://t.co/0n6K3y6FBC</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1241503107806384133?ref_src=twsrc%5Etfw">March 21, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/reviewing-the-surface-book-2.htmlWed, 18 Mar 2020 00:00:00 +0000
- Writing a SQL database from scratch in Go: 1. SELECT, INSERT, CREATE and a REPLhttp://notes.eatonphil.com/database-basics.html<p class="note">
Next in database basics:
<! forgive me, for I have sinned >
<br />
<a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a>
<br />
<a href="/database-basics-indexes.html">3. indexes</a>
<br />
<a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a>
</p><p>In this series we'll write a rudimentary database from
scratch in Go. Project source code is available on
<a href="https://github.com/eatonphil/gosql">Github</a>.</p>
<p>In this first post we'll build enough of a parser to run some simple
<code>CREATE</code>, <code>INSERT</code>, and <code>SELECT</code>
queries. Then we'll build an in-memory backend
supporting <code>TEXT</code> and <code>INT</code> types and write a
basic REPL.</p>
<p>We'll be able to support the following interaction:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="o">*</span><span class="p">.</span><span class="k">go</span>
<span class="n">Welcome</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">gosql</span><span class="p">.</span>
<span class="o">#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'Phil'</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="o">|</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span>
<span class="o">====================</span>
<span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Phil</span><span class="w"> </span><span class="o">|</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s1">'Kate'</span><span class="p">);</span>
<span class="n">ok</span>
<span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span>
<span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span>
<span class="o">====================</span>
<span class="o">|</span><span class="w"> </span><span class="n">Phil</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span>
<span class="o">|</span><span class="w"> </span><span class="n">Kate</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span>
<span class="n">ok</span>
</pre></div>
<p>The first stage will be to map a SQL source into a list of tokens
(lexing). Then we'll call parse functions to find individual SQL
statements (such as <code>SELECT</code>). These parse functions will
in turn call their own helper functions to find patterns of
recursively parseable chunks, keywords, symbols (like parenthesis),
identifiers (like a table name), and numeric or string literals.</p>
<p>Then, we'll write an in-memory backend to do operations based on an
AST. Finally, we'll write a REPL to accept SQL from a CLI and pass it
to the in-memory backend.</p>
<p class="note">
This post assumes a basic understanding of parsing concepts. We
won't skip any code, but also won't go into great detail on why we
structure the way we do.
<br />
<br />
For a simpler introduction to parsing and parsing concepts,
see <a href="/writing-a-simple-json-parser.html">this post on
parsing JSON</a>.
</p><h3 id="lexing">Lexing</h3><p>The lexer is responsible for finding every distinct group of
characters in source code: tokens. This will consist primarily of
identifiers, numbers, strings, and symbols.</p>
<p class="note">
What follows is a second, more orthodox pass at lexing. The first
pass took a number of shortcuts and couldn't handle spaces in
strings, for example.
<br />
<br />
<a href="https://github.com/eatonphil/gosql/pull/2">Here is the
relevant pull request in gosql if you are curious.</a>
</p><p>The gist of the logic will be to pass control to a helper function for
each kind of token. If the helper function succeeds in finding a
token, it will return true and the location for the lexer to start at
next. It will continue doing this until it reaches the end of the
source.</p>
<p>First off, we'll define a few types and constants for use
in <code>lexer.go</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">gosql</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">location</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="kt">uint</span>
<span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="kt">uint</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="kt">string</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">selectKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"select"</span>
<span class="w"> </span><span class="nx">fromKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"from"</span>
<span class="w"> </span><span class="nx">asKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"as"</span>
<span class="w"> </span><span class="nx">tableKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"table"</span>
<span class="w"> </span><span class="nx">createKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"create"</span>
<span class="w"> </span><span class="nx">insertKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"insert"</span>
<span class="w"> </span><span class="nx">intoKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"into"</span>
<span class="w"> </span><span class="nx">valuesKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"values"</span>
<span class="w"> </span><span class="nx">intKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"int"</span>
<span class="w"> </span><span class="nx">textKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"text"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="kt">string</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">semicolonSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">";"</span>
<span class="w"> </span><span class="nx">asteriskSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"*"</span>
<span class="w"> </span><span class="nx">commaSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">","</span>
<span class="w"> </span><span class="nx">leftparenSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">"("</span>
<span class="w"> </span><span class="nx">rightparenSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">")"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">tokenKind</span><span class="w"> </span><span class="kt">uint</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">keywordKind</span><span class="w"> </span><span class="nx">tokenKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">symbolKind</span>
<span class="w"> </span><span class="nx">identifierKind</span>
<span class="w"> </span><span class="nx">stringKind</span>
<span class="w"> </span><span class="nx">numericKind</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">tokenKind</span>
<span class="w"> </span><span class="nx">loc</span><span class="w"> </span><span class="nx">location</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="kt">uint</span>
<span class="w"> </span><span class="nx">loc</span><span class="w"> </span><span class="nx">location</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">equals</span><span class="p">(</span><span class="nx">other</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">other</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">other</span><span class="p">.</span><span class="nx">kind</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">lexer</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span>
</pre></div>
<p>Next we'll write out the main loop:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lex</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cursor</span><span class="p">{}</span>
<span class="nx">lex</span><span class="p">:</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">lexers</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">lexer</span><span class="p">{</span><span class="nx">lexKeyword</span><span class="p">,</span><span class="w"> </span><span class="nx">lexSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">lexString</span><span class="p">,</span><span class="w"> </span><span class="nx">lexNumeric</span><span class="p">,</span><span class="w"> </span><span class="nx">lexIdentifier</span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">l</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">lexers</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="c1">// Omit nil tokens for valid, but empty syntax like newlines</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nx">lex</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">hint</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">hint</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">" after "</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">value</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">"Unable to lex token%s, at %d:%d"</span><span class="p">,</span><span class="w"> </span><span class="nx">hint</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>Then we'll write a helper for each kind of fundemental token.</p>
<h4 id="analyzing-numbers">Analyzing numbers</h4><p>Numbers are the most complex. So we'll refer to the <a href="https://www.postgresql.org/docs/current/sql-syntax-lexical.html">PostgreSQL
documentation (section
4.1.2.6)</a>
for what constitutes a valid number.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexNumeric</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span>
<span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="nx">expMarkerFound</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">));</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="nx">isDigit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'0'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'9'</span>
<span class="w"> </span><span class="nx">isPeriod</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'.'</span>
<span class="w"> </span><span class="nx">isExpMarker</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'e'</span>
<span class="w"> </span><span class="c1">// Must start with a digit or period</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isDigit</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="p">!</span><span class="nx">isPeriod</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">isPeriod</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isPeriod</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isExpMarker</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">expMarkerFound</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// No periods allowed after expMarker</span>
<span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="nx">expMarkerFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="c1">// expMarker must be followed by digits</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cNext</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cNext</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'-'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">cNext</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'+'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isDigit</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// No characters accumulated</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="p">:</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">],</span>
<span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">numericKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<h4 id="analyzing-strings">Analyzing strings</h4><p>Strings must start and end with a single apostrophe. They can contain
a single apostophe if it is followed by another single
apostrophe. We'll put this kind of character delimited lexing logic
into a helper function so we can use it again when analyzing
identifiers.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexCharacterDelimited</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">:])</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">));</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// SQL escapes are via double characters, not backslash.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">+</span><span class="mi">1</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">))</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">),</span>
<span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">stringKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">)</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">lexString</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">lexCharacterDelimited</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="sc">'\''</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<h4 id="analyzing-symbols-and-keywords">Analyzing symbols and keywords</h4><p>Symbols come from a fixed set of strings, so they're easy
to compare against. Whitespace should be thrown away.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexSymbol</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span>
<span class="w"> </span><span class="c1">// Will get overwritten later if not an ignored syntax</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Syntax that should be thrown away</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'\n'</span><span class="p">:</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">'\t'</span><span class="p">:</span>
<span class="w"> </span><span class="k">fallthrough</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">' '</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Syntax that should be kept</span>
<span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">symbol</span><span class="p">{</span>
<span class="w"> </span><span class="nx">commaSymbol</span><span class="p">,</span>
<span class="w"> </span><span class="nx">leftParenSymbol</span><span class="p">,</span>
<span class="w"> </span><span class="nx">rightParenSymbol</span><span class="p">,</span>
<span class="w"> </span><span class="nx">semicolonSymbol</span><span class="p">,</span>
<span class="w"> </span><span class="nx">asteriskSymbol</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">options</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Use `ic`, not `cur`</span>
<span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">longestMatch</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="p">)</span>
<span class="w"> </span><span class="c1">// Unknown character</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">match</span><span class="p">,</span>
<span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>Keywords are even simpler, and use the same <code>longestMatch</code>
helper.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexKeyword</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span>
<span class="w"> </span><span class="nx">keywords</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">keyword</span><span class="p">{</span>
<span class="w"> </span><span class="nx">selectKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">insertKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">valuesKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">tableKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">createKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">whereKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fromKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">intoKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="nx">textKeyword</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">keywords</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">options</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">k</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">longestMatch</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">""</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">match</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">kind</span><span class="p">,</span>
<span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>And finally we implement the <code>longestMatch</code> helper:</p>
<div class="highlight"><pre><span></span><span class="c1">// longestMatch iterates through a source string starting at the given</span>
<span class="c1">// cursor to find the longest matching substring among the provided</span>
<span class="c1">// options</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">longestMatch</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToLower</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]))</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span>
<span class="w"> </span><span class="nx">match</span><span class="p">:</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">option</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">skip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">skip</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nx">match</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Deal with cases like INT vs INTO</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">option</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">skipList</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">option</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">option</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">sharesPrefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">option</span><span class="p">[:</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">-</span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span>
<span class="w"> </span><span class="nx">tooLong</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">option</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">tooLong</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">!</span><span class="nx">sharesPrefix</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">skipList</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">skipList</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">options</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">match</span>
<span class="p">}</span>
</pre></div>
<h4 id="analyzing-identifiers">Analyzing identifiers</h4><p>An identifier is either a double-quoted string or a group of
characters starting with an alphabetical character and possibly
containing numbers and underscores.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexIdentifier</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Handle separately if is a double-quoted identifier</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lexCharacterDelimited</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="sc">'"'</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// Other characters count too, big ignoring non-ascii for now</span>
<span class="w"> </span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'Z'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'a'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'z'</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">{</span><span class="nx">c</span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">));</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span>
<span class="w"> </span><span class="c1">// Other characters count too, big ignoring non-ascii for now</span>
<span class="w"> </span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'A'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'Z'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'a'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'z'</span><span class="p">)</span>
<span class="w"> </span><span class="nx">isNumeric</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="sc">'0'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="sc">'9'</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">isNumeric</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'$'</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">'_'</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span>
<span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Unquoted dentifiers are case-insensitive</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToLower</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">)),</span>
<span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for the lexer! If you copy
<a href="https://github.com/eatonphil/gosql/blob/master/lexer_test.go">lexer_test.go</a>
from the main project, the tests should now pass.</p>
<h3 id="ast-model">AST model</h3><p>At the highest level, an AST is a collection of statements:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Ast</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Statements</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">Statement</span>
<span class="p">}</span>
</pre></div>
<p>A statement, for now, is one of <code>INSERT</code>,
<code>CREATE</code>, or <code>SELECT</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">AstKind</span><span class="w"> </span><span class="kt">uint</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">SelectKind</span><span class="w"> </span><span class="nx">AstKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">CreateTableKind</span>
<span class="w"> </span><span class="nx">InsertKind</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Statement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">SelectStatement</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span>
<span class="w"> </span><span class="nx">CreateTableStatement</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span>
<span class="w"> </span><span class="nx">InsertStatement</span><span class="w"> </span><span class="o">*</span><span class="nx">InsertStatement</span>
<span class="w"> </span><span class="nx">Kind</span><span class="w"> </span><span class="nx">AstKind</span>
<span class="p">}</span>
</pre></div>
<h4 id="insert">INSERT</h4><p>An insert statement, for now, has a table name and a list of values to
insert:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">InsertStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="nx">token</span>
<span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span>
<span class="p">}</span>
</pre></div>
<p>An expression is a literal token or (in the future) a function call or
inline operation:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">expressionKind</span><span class="w"> </span><span class="kt">uint</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="nx">expressionKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">expression</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">literal</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span>
<span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">expressionKind</span>
<span class="p">}</span>
</pre></div>
<h4 id="create">CREATE</h4><p>A create statement, for now, has a table name and a list of column
names and types:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">columnDefinition</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">token</span>
<span class="w"> </span><span class="nx">datatype</span><span class="w"> </span><span class="nx">token</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">CreateTableStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">token</span>
<span class="w"> </span><span class="nx">cols</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">columnDefinition</span>
<span class="p">}</span>
</pre></div>
<h4 id="select">SELECT</h4><p>A select statement, for now, has a table name and a list of column
names:</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">item</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span>
<span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="nx">token</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for the AST.</p>
<h3 id="parsing">Parsing</h3><p>The <code>Parse</code> entrypoint will take a list of tokens and
attempt to parse statements, separated by a semi-colon, until it
reaches the last token.</p>
<p>In general our strategy will be to increment and pass around a cursor
containing the current position of unparsed tokens. Each helper will
return the new cursor that the caller should start from.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"errors"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">k</span><span class="w"> </span><span class="nx">keyword</span><span class="p">)</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">,</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">k</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">symbol</span><span class="p">)</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">,</span>
<span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">])</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"[%d,%d]: %s, got: %s\n"</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">Parse</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Ast</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lex</span><span class="p">(</span><span class="nx">source</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">Ast</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">stmt</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">semicolonSymbol</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected statement"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Failed to parse, expected statement"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">a</span><span class="p">.</span><span class="nx">Statements</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">a</span><span class="p">.</span><span class="nx">Statements</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="p">)</span>
<span class="w"> </span><span class="nx">atLeastOneSemicolon</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">semicolonSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="nx">atLeastOneSemicolon</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">atLeastOneSemicolon</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected semi-colon delimiter between statements"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Missing semi-colon between statements"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h4 id="parsing-statements">Parsing statements</h4><p>Each statement will be one of <code>INSERT</code>,
<code>CREATE</code>, or <code>SELECT</code>. The
<code>parseStatement</code> helper will call a helper on each of these
statement types and return <code>true</code> if one of them succeeds
in parsing.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="c1">// Look for a SELECT statement</span>
<span class="w"> </span><span class="nx">semicolonToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">semicolonSymbol</span><span class="p">)</span>
<span class="w"> </span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">semicolonToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Statement</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Kind</span><span class="p">:</span><span class="w"> </span><span class="nx">SelectKind</span><span class="p">,</span>
<span class="w"> </span><span class="nx">SelectStatement</span><span class="p">:</span><span class="w"> </span><span class="nx">slct</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for a INSERT statement</span>
<span class="w"> </span><span class="nx">inst</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseInsertStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">semicolonToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Statement</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Kind</span><span class="p">:</span><span class="w"> </span><span class="nx">InsertKind</span><span class="p">,</span>
<span class="w"> </span><span class="nx">InsertStatement</span><span class="p">:</span><span class="w"> </span><span class="nx">inst</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for a CREATE statement</span>
<span class="w"> </span><span class="nx">crtTbl</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseCreateTableStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">semicolonToken</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Statement</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Kind</span><span class="p">:</span><span class="w"> </span><span class="nx">CreateTableKind</span><span class="p">,</span>
<span class="w"> </span><span class="nx">CreateTableStatement</span><span class="p">:</span><span class="w"> </span><span class="nx">crtTbl</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<h4 id="parsing-select-statements">Parsing select statements</h4><p>Parsing <code>SELECT</code> statements is easy. We'll look for the
following token pattern:</p>
<ol>
<li><code>SELECT</code></li>
<li><code>$expression [, ...]</code></li>
<li><code>FROM</code></li>
<li><code>$table-name</code></li>
</ol>
<p>Sketching that out we get:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">selectKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="nx">slct</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpressions</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">fromKeyword</span><span class="p">),</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">exps</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">fromKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected FROM token"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">from</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>The <code>parseToken</code> helper will look for a token of a
particular token kind.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">tokenKind</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">current</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">current</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<p>The <code>parseExpressions</code> helper will look for tokens
separated by a comma until a delimiter is found. It will use existing
helpers plus <code>parseExpression</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpressions</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span><span class="p">{}</span>
<span class="nx">outer</span><span class="p">:</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for delimiter</span>
<span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">current</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">outer</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for comma</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">exps</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">commaSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for expression</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">commaSymbol</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected expression"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>The <code>parseExpression</code> helper (for now) will look for a
numeric, string, or identifier token.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="nx">kinds</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">tokenKind</span><span class="p">{</span><span class="nx">identifierKind</span><span class="p">,</span><span class="w"> </span><span class="nx">numericKind</span><span class="p">,</span><span class="w"> </span><span class="nx">stringKind</span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">kinds</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">expression</span><span class="p">{</span>
<span class="w"> </span><span class="nx">literal</span><span class="p">:</span><span class="w"> </span><span class="nx">t</span><span class="p">,</span>
<span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">literalKind</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for parsing a <code>SELECT</code> statement!</p>
<h4 id="parsing-insert-statements">Parsing insert statements</h4><p>We'll look for the following token pattern:</p>
<ol>
<li><code>INSERT</code></li>
<li><code>INTO</code></li>
<li><code>$table-name</code></li>
<li><code>VALUES</code></li>
<li><code>(</code></li>
<li><code>$expression [, ...]</code></li>
<li><code>)</code></li>
</ol>
<p>With the existing helpers, this is straightforward to sketch out:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseInsertStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="c1">// Look for INSERT</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">insertKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="c1">// Look for INTO</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">intoKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected into"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="c1">// Look for table name</span>
<span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected table name"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="c1">// Look for VALUES</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">valuesKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected VALUES"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="c1">// Look for left paren</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected left paren"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="c1">// Look for expression list</span>
<span class="w"> </span><span class="nx">values</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpressions</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">)})</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="c1">// Look for right paren</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected right paren"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">InsertStatement</span><span class="p">{</span>
<span class="w"> </span><span class="nx">table</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">,</span>
<span class="w"> </span><span class="nx">values</span><span class="p">:</span><span class="w"> </span><span class="nx">values</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for parsing an <code>INSERT</code> statement!</p>
<h4 id="parsing-create-statements">Parsing create statements</h4><p>Finally, for create statements we'll look for the following token
pattern:</p>
<ol>
<li><code>CREATE</code></li>
<li><code>$table-name</code></li>
<li><code>(</code></li>
<li><code>[$column-name $column-type [, ...]]</code></li>
<li><code>)</code></li>
</ol>
<p>Sketching that out with a new <code>parseColumnDefinitions</code>
helper we get:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseCreateTableStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">createKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">tableKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected table name"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected left parenthesis"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="nx">cols</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseColumnDefinitions</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected right parenthesis"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">CreateTableStatement</span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">name</span><span class="p">,</span>
<span class="w"> </span><span class="nx">cols</span><span class="p">:</span><span class="w"> </span><span class="nx">cols</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>The <code>parseColumnDefinitions</code> helper will look column names
followed by column types separated by a comma and ending with some
delimiter:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseColumnDefinitions</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">columnDefinition</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span>
<span class="w"> </span><span class="nx">cds</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">columnDefinition</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for a delimiter</span>
<span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">current</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for a comma</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">cds</span><span class="p">)</span><span class="w"> </span><span class="p">></span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">commaSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected comma"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="o">++</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Look for a column name</span>
<span class="w"> </span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected column name"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="c1">// Look for a column type</span>
<span class="w"> </span><span class="nx">ty</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">"Expected column type"</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span>
<span class="w"> </span><span class="nx">cds</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">cds</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">columnDefinition</span><span class="p">{</span>
<span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">id</span><span class="p">,</span>
<span class="w"> </span><span class="nx">datatype</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">ty</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">cds</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
</pre></div>
<p>And that's it for parsing! If you copy
<a href="https://github.com/eatonphil/gosql/blob/master/parser_test.go">parser_test.go</a>
from the main project, the tests should now pass.</p>
<h3 id="an-in-memory-backend">An in-memory backend</h3><p>Our in-memory backend should conform to a general backend interface
that allows a user to create, select, and insert data:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="s">"errors"</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="kt">uint</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">IntType</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Cell</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Results</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Columns</span><span class="w"> </span><span class="p">[]</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">Rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span>
<span class="p">}</span>
<span class="kd">var</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">ErrTableDoesNotExist</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Table does not exist"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Column does not exist"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ErrInvalidSelectItem</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Select item is not valid"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ErrInvalidDatatype</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Invalid datatype"</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ErrMissingValues</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">"Missing values"</span><span class="p">)</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">Backend</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">Insert</span><span class="p">(</span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span>
<span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>This leaves us room in the future for a disk-backed backend.</p>
<h4 id="memory-layout">Memory layout</h4><p>Our in-memory backend should store a list of tables. Each table
will have a list of columns and rows. Each column will have a name and
type. Each row will have a list of byte arrays.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bytes"</span>
<span class="w"> </span><span class="s">"encoding/binary"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"strconv"</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="kt">int32</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewBuffer</span><span class="p">(</span><span class="nx">mc</span><span class="p">),</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">i</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">mc</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span>
<span class="w"> </span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">[]</span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">MemoryCell</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">tables</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">NewMemoryBackend</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">MemoryBackend</span><span class="p">{</span>
<span class="w"> </span><span class="nx">tables</span><span class="p">:</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span><span class="p">{},</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="implementing-create-table-support">Implementing create table support</h4><p>When creating a table, we'll make a new entry in the backend tables
map. Then we'll create columns as specified by the AST.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">crt</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">table</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&</span><span class="nx">t</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">datatype</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"int"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">IntType</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">"text"</span><span class="p">:</span>
<span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrInvalidDatatype</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">dt</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h4 id="implementing-insert-support">Implementing insert support</h4><p>Keeping things simple, we'll assume the value passed can be correctly
mapped to the type of the column specified.</p>
<p>We'll reference a helper for mapper values to internal storage,
<code>tokenToCell</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Insert</span><span class="p">(</span><span class="nx">inst</span><span class="w"> </span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">inst</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">MemoryCell</span><span class="p">{}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">table</span><span class="p">.</span><span class="nx">columns</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrMissingValues</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Skipping non-literal."</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tokenToCell</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">literal</span><span class="p">))</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">table</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<p>The <code>tokenToCell</code> helper will write numbers as binary bytes
and will write strings as bytes:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">tokenToCell</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">numericKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">new</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span><span class="p">)</span>
<span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">int32</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">buf</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h4 id="implementing-select-support">Implementing select support</h4><p>Finally, for select we'll iterate over each row in the table and
return the cells according to the columns specified by the AST.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">table</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="p">}{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Unsupported, doesn't currently exist, ignore.</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Skipping non-literal expression."</span><span class="p">)</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">lit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">literal</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">identifierKind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span>
<span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span>
<span class="w"> </span><span class="p">}{</span>
<span class="w"> </span><span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span>
<span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span>
<span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="w"> </span><span class="k">break</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">found</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">continue</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&</span><span class="nx">Results</span><span class="p">{</span>
<span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span>
<span class="w"> </span><span class="nx">Rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span>
<span class="p">}</span>
</pre></div>
<h3 id="the-repl">The REPL</h3><p>At last, we're ready to wrap the parser and in-memory backend in a
REPL. The most complex part is displaying the table of results from a
select query.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"bufio"</span>
<span class="w"> </span><span class="s">"fmt"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"strings"</span>
<span class="w"> </span><span class="s">"github.com/eatonphil/gosql"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">mb</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">NewMemoryBackend</span><span class="p">()</span>
<span class="w"> </span><span class="nx">reader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"Welcome to gosql."</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="s">"# "</span><span class="p">)</span>
<span class="w"> </span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">reader</span><span class="p">.</span><span class="nx">ReadString</span><span class="p">(</span><span class="sc">'\n'</span><span class="p">)</span>
<span class="w"> </span><span class="nx">text</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Replace</span><span class="p">(</span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="s">"\n"</span><span class="p">,</span><span class="w"> </span><span class="s">""</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">CreateTableKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">CreateTableStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"ok"</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">InsertKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">InsertStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"ok"</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">SelectKind</span><span class="p">:</span>
<span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">SelectStatement</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Columns</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"| %s "</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">Name</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"|"</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p"><</span><span class="w"> </span><span class="mi">20</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"="</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Rows</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">"|"</span><span class="p">)</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">cell</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Columns</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Type</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">""</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">IntType</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">"%d"</span><span class="p">,</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">TextType</span><span class="p">:</span>
<span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">" %s | "</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">()</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">"ok"</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Putting it all together:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>run<span class="w"> </span>*.go
Welcome<span class="w"> </span>to<span class="w"> </span>gosql.
<span class="c1"># CREATE TABLE users (id INT, name TEXT);</span>
ok
<span class="c1"># INSERT INTO users VALUES (1, 'Phil');</span>
ok
<span class="c1"># SELECT id, name FROM users;</span>
<span class="p">|</span><span class="w"> </span>id<span class="w"> </span><span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span>
<span class="o">====================</span>
<span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>Phil<span class="w"> </span><span class="p">|</span>
ok
<span class="c1"># INSERT INTO users VALUES (2, 'Kate');</span>
ok
<span class="c1"># SELECT name, id FROM users;</span>
<span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>id<span class="w"> </span><span class="p">|</span>
<span class="o">====================</span>
<span class="p">|</span><span class="w"> </span>Phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span>
<span class="p">|</span><span class="w"> </span>Kate<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span>
ok
</pre></div>
<p>And we've got a very simple SQL database!</p>
<p>Next up we'll get into filtering, sorting, and indexing.</p>
<h4 id="further-reading">Further reading</h4><ul>
<li><a href="/writing-a-simple-json-parser.html">Writing a simple JSON parser</a><ul>
<li>This post goes into a little more detail about the theory and basics of parsing.</li>
</ul>
</li>
<li><a href="https://www.goodreads.com/book/show/617120.Database_Systems">Database Systems: A Practical Approach to Design, Implementation and Management</a><ul>
<li>A giant book, but an excellent and very easy introduction to database theory.</li>
</ul>
</li>
</ul>
<p><blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">Latest blog post: writing a simple SQL database from scratch in Go <a href="https://t.co/csQmNhWIEf">https://t.co/csQmNhWIEf</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1237522975143776256?ref_src=twsrc%5Etfw">March 10, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/database-basics.htmlFri, 06 Mar 2020 00:00:00 +0000
- A minimal REST API in Javahttp://notes.eatonphil.com/a-minimal-rest-api-in-java.html<p>There's a style of Java that is a joy to write. This post will cover
how to set up a basic PostgreSQL-integrated REST API using
<a href="https://eclipse-ee4j.github.io/jersey/">Jersey</a> and
<a href="https://www.jooq.org/">JOOQ</a> in a style not dissimilar to Flask and
SQLAlchemy in Python.</p>
<p>In particular, we'll try to avoid as much runtime
reflection/class-loading as possible. This will make the application
less flexible but easier to debug and understand.</p>
<p>I'd appreciate pointers in email if you see anything weird or can fix
any of my bugs.</p>
<h3 id="dependencies">Dependencies</h3><p>Install <a href="https://maven.apache.org/">Maven</a>, a recent
<a href="https://openjdk.java.net/">JDK</a>, and PostgreSQL.</p>
<p>Copy the following into <code>pom.xml</code> to tell Maven about Java
dependencies:</p>
<div class="highlight"><pre><span></span><span class="nt"><project</span><span class="w"> </span><span class="na">xmlns=</span><span class="s">"http://maven.apache.org/POM/4.0.0"</span><span class="w"> </span><span class="na">xmlns:xsi=</span><span class="s">"http://www.w3.org/2001/XMLSchema-instance"</span>
<span class="w"> </span><span class="na">xsi:schemaLocation=</span><span class="s">"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"</span><span class="nt">></span>
<span class="w"> </span><span class="nt"><modelVersion></span>4.0.0<span class="nt"></modelVersion></span>
<span class="w"> </span><span class="nt"><groupId></span>api<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>api<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1.0-SNAPSHOT<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><properties></span>
<span class="w"> </span><span class="nt"><maven.compiler.source></span>13<span class="nt"></maven.compiler.source></span>
<span class="w"> </span><span class="nt"><maven.compiler.target></span>13<span class="nt"></maven.compiler.target></span>
<span class="w"> </span><span class="nt"></properties></span>
<span class="w"> </span><span class="nt"><build></span>
<span class="w"> </span><span class="nt"><plugins></span>
<span class="w"> </span><span class="nt"><plugin></span>
<span class="w"> </span><span class="nt"><groupId></span>org.apache.maven.plugins<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>maven-compiler-plugin<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>3.8.1<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><configuration></span>
<span class="w"> </span><span class="nt"><compilerArgs></span>
<span class="w"> </span><span class="nt"><arg></span>-Xlint:all,-options,-path<span class="nt"></arg></span>
<span class="w"> </span><span class="nt"></compilerArgs></span>
<span class="w"> </span><span class="nt"></configuration></span>
<span class="w"> </span><span class="nt"></plugin></span>
<span class="w"> </span><span class="nt"><plugin></span>
<span class="w"> </span><span class="nt"><groupId></span>org.codehaus.mojo<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>exec-maven-plugin<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1.6.0<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><configuration></span>
<span class="w"> </span><span class="nt"><mainClass></span>api.Main<span class="nt"></mainClass></span>
<span class="w"> </span><span class="nt"></configuration></span>
<span class="w"> </span><span class="nt"></plugin></span>
<span class="w"> </span><span class="nt"></plugins></span>
<span class="w"> </span><span class="nt"></build></span>
<span class="w"> </span><span class="nt"><dependencies></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.glassfish.jersey.containers<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jersey-container-jetty-http<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.30<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.jooq<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jooq<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>3.12.3<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.jooq<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jooq-meta<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>3.12.3<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.postgresql<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>postgresql<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>42.2.9<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.glassfish.jersey.inject<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jersey-hk2<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.30<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>ch.qos.logback<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>logback-core<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1.2.3<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.slf4j<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>slf4j-api<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1.7.30<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>ch.qos.logback<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>logback-classic<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1.2.3<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.glassfish.jersey.media<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jersey-media-json-jackson<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.30<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>javax.persistence<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>javax.persistence-api<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.2<span class="nt"></version></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>org.projectlombok<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>lombok<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>1.18.10<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><scope></span>provided<span class="nt"></scope></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"><dependency></span>
<span class="w"> </span><span class="nt"><groupId></span>com.fasterxml.jackson<span class="nt"></groupId></span>
<span class="w"> </span><span class="nt"><artifactId></span>jackson-bom<span class="nt"></artifactId></span>
<span class="w"> </span><span class="nt"><version></span>2.10.2<span class="nt"></version></span>
<span class="w"> </span><span class="nt"><type></span>pom<span class="nt"></type></span>
<span class="w"> </span><span class="nt"></dependency></span>
<span class="w"> </span><span class="nt"></dependencies></span>
<span class="nt"></project></span>
</pre></div>
<p>Now run <code>mvn install</code> to download and configure all dependencies.</p>
<h3 id="project-setup">Project setup</h3><p>The <code>Main</code> class will be our entrypoint
within <code>src/main/java/api/Main.java</code>.</p>
<p>It will handle loading configuration, setting up the application
server, and starting it.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.io.InputStream</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">api.app.Application</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">api.app.Config</span><span class="p">;</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Main</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">String</span><span class="o">[]</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">cfg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Config</span><span class="p">();</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Application</span><span class="p">(</span><span class="n">cfg</span><span class="p">);</span>
<span class="w"> </span><span class="n">server</span><span class="p">.</span><span class="na">start</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="na">printStackTrace</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>The <code>Config</code> class in
<code>src/main/java/api/app/Config.java</code> will contain a few
hard-coded settings for now. In the future it could be read from a
file.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.app</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.io.InputStream</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.time.Duration</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.util.Properties</span><span class="p">;</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">server_address</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"http://localhost"</span><span class="p">;</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">server_port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7780</span><span class="p">;</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">db_connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"jdbc:postgresql://localhost/todo"</span><span class="p">;</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">db_username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"todo"</span><span class="p">;</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">db_password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"todo"</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And finally the <code>Application</code> class
in <code>src/main/java/api/app/Application.java</code> will handle
loading a PostgreSQL connection, registering the class path to look
for Jersey routes/controllers, registering the PostgreSQL connection
in the dependency injection controller and starting the Jersey
controller.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.app</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.core.UriBuilder</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.glassfish.jersey.internal.inject.AbstractBinder</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.glassfish.jersey.jetty.JettyHttpContainerFactory</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.glassfish.jersey.server.ResourceConfig</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.slf4j.LoggerFactory</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">api.dao.Dao</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">ch.qos.logback.classic.Level</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">ch.qos.logback.classic.Logger</span><span class="p">;</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Application</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Logger</span><span class="w"> </span><span class="n">logger</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Logger</span><span class="p">)</span><span class="w"> </span><span class="n">LoggerFactory</span><span class="p">.</span><span class="na">getLogger</span><span class="p">(</span><span class="n">Application</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Logger</span><span class="w"> </span><span class="n">rootLogger</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Logger</span><span class="p">)</span><span class="w"> </span><span class="n">LoggerFactory</span><span class="p">.</span><span class="na">getLogger</span><span class="p">(</span><span class="n">Logger</span><span class="p">.</span><span class="na">ROOT_LOGGER_NAME</span><span class="p">);</span>
<span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">rootLogger</span><span class="p">.</span><span class="na">setLevel</span><span class="p">(</span><span class="n">Level</span><span class="p">.</span><span class="na">INFO</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="n">cfg</span><span class="p">;</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">Application</span><span class="p">(</span><span class="kd">final</span><span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="n">_cfg</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">cfg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_cfg</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">addShutdownHook</span><span class="p">(</span><span class="kd">final</span><span class="w"> </span><span class="n">Runnable</span><span class="w"> </span><span class="n">hook</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Runtime</span><span class="p">.</span><span class="na">getRuntime</span><span class="p">().</span><span class="na">addShutdownHook</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Thread</span><span class="p">(</span><span class="n">hook</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">start</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">dao</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Dao</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="na">db_connection</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">db_username</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">db_password</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">initialize</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="na">printStackTrace</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">addShutdownHook</span><span class="p">(()</span><span class="w"> </span><span class="o">-></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">close</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="n">java</span><span class="p">.</span><span class="na">sql</span><span class="p">.</span><span class="na">SQLException</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="na">printStackTrace</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">resourceConfig</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ResourceConfig</span><span class="p">();</span>
<span class="w"> </span><span class="n">resourceConfig</span><span class="p">.</span><span class="na">packages</span><span class="p">(</span><span class="s">"api.controller"</span><span class="p">);</span>
<span class="w"> </span><span class="n">resourceConfig</span><span class="p">.</span><span class="na">register</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">AbstractBinder</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Override</span>
<span class="w"> </span><span class="kd">protected</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">configure</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">bind</span><span class="p">(</span><span class="n">dao</span><span class="p">).</span><span class="na">to</span><span class="p">(</span><span class="n">Dao</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">baseUri</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UriBuilder</span><span class="p">.</span><span class="na">fromUri</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="na">server_address</span><span class="p">).</span><span class="na">port</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="na">server_port</span><span class="p">).</span><span class="na">build</span><span class="p">();</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JettyHttpContainerFactory</span><span class="p">.</span><span class="na">createServer</span><span class="p">(</span><span class="n">baseUri</span><span class="p">,</span><span class="w"> </span><span class="n">resourceConfig</span><span class="p">);</span>
<span class="w"> </span><span class="n">logger</span><span class="p">.</span><span class="na">info</span><span class="p">(</span><span class="s">"Started listening on {}:{}"</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">server_address</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">server_port</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p class="note">
I couldn't figure out a reasonable way to avoid the class path
registration for routes.
<br />
<br />
It's also important to note that the <code>AbstractBinder</code>
appears to search the class path implicitly for any available
dependency injection controller. I'd rather we had specified it
explicitly but I'm not sure how. It will succeed because we
installed
<a href="https://javaee.github.io/hk2/">HK2</a> as a dependency
(see <code>pom.xml</code>).
</p><p>With the <code>Application</code> code finished, we'll need to build
out the referenced <code>Dao</code> and controller classes.</p>
<h3 id="dao">Dao</h3><p>The <code>Dao</code> class in
<code>src/main/java/api/dao/Dao.java</code> will enclose the
connection to PostgreSQL via JOOQ.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.dao</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.sql.Connection</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.sql.DriverManager</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.sql.SQLException</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.DSLContext</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.SQLDialect</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.impl.DSL</span><span class="p">;</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Dao</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">Connection</span><span class="w"> </span><span class="n">conn</span><span class="p">;</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">url</span><span class="p">;</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">username</span><span class="p">;</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">password</span><span class="p">;</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">Dao</span><span class="p">(</span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">_url</span><span class="p">,</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">_username</span><span class="p">,</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">_password</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_url</span><span class="p">;</span>
<span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_username</span><span class="p">;</span>
<span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_password</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">initialize</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="n">SQLException</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">conn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">DriverManager</span><span class="p">.</span><span class="na">getConnection</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">close</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="n">SQLException</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">conn</span><span class="p">.</span><span class="na">close</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">DSLContext</span><span class="w"> </span><span class="nf">getDSLContext</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">DSL</span><span class="p">.</span><span class="na">using</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span><span class="w"> </span><span class="n">SQLDialect</span><span class="p">.</span><span class="na">POSTGRES</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And this will be enough to use in our controller. But let's take a
moment to talk about the data model.</p>
<h3 id="data">Data</h3><p>This API will return results from a TODO list. The database should
store each TODO item and a timestamp of completion, if completed.</p>
<p>We'll start by creating a database and user for the application:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>su<span class="w"> </span>postgres
postgres<span class="w"> </span>$<span class="w"> </span>psql
<span class="nv">postgres</span><span class="o">=</span><span class="c1"># CREATE DATABASE todo;</span>
<span class="nv">postgres</span><span class="o">=</span><span class="c1"># CREATE USER todo WITH PASSWORD 'todo';</span>
<span class="nv">postgres</span><span class="o">=</span><span class="c1"># GRANT ALL ON DATABASE todo TO todo;</span>
</pre></div>
<p>Then we'll write an initial migration:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="n">migrations</span><span class="o">/</span><span class="mi">1</span><span class="n">_init</span><span class="p">.</span><span class="k">sql</span>
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">todo_item</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="n">BIGSERIAL</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span>
<span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span>
<span class="w"> </span><span class="n">created_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="n">NOW</span><span class="p">(),</span>
<span class="w"> </span><span class="n">completed_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span>
<span class="p">);</span>
</pre></div>
<p>And a helper script for running migrations:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>scripts/migrate.sh
<span class="c1">#!/usr/bin/env bash</span>
<span class="nb">set</span><span class="w"> </span>-e
<span class="nb">export</span><span class="w"> </span><span class="nv">PGPASSWORD</span><span class="o">=</span>todo
<span class="k">for</span><span class="w"> </span>file<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">$(</span>ls<span class="w"> </span>migrations<span class="k">)</span><span class="p">;</span><span class="w"> </span><span class="k">do</span>
<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">"Running migration: </span><span class="nv">$file</span><span class="s2">"</span>
<span class="w"> </span>psql<span class="w"> </span>-U<span class="w"> </span>todo<span class="w"> </span>-f<span class="w"> </span><span class="s2">"migrations/</span><span class="nv">$file</span><span class="s2">"</span>
<span class="k">done</span>
</pre></div>
<p>Run it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>chmod<span class="w"> </span>+x<span class="w"> </span>./scripts/migrate.sh
$<span class="w"> </span>./scripts/migrate.sh
Running<span class="w"> </span>migration:<span class="w"> </span>1_init.sql
CREATE<span class="w"> </span>TABLE
</pre></div>
<p>And let's add some data:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>su<span class="w"> </span>postgres
postgres<span class="w"> </span>$<span class="w"> </span>psql<span class="w"> </span>-U<span class="w"> </span>todo
<span class="nv">todo</span><span class="o">=</span><span class="c1"># INSERT INTO todo_item (item) VALUES ('My note');</span>
</pre></div>
<p>Now we're ready to model the data in Java.</p>
<h3 id="models">Models</h3><p>While it's possible to have <a href="https://www.jooq.org/doc/3.12/manual/code-generation/">JOOQ generate Java data
classes</a> (or
POJOs) by reading the database schema, the generated class cannot be
directly serialized to a JSON string.</p>
<p>So for each table (there's only one) we'll write a class with fields
for each column. We'll use the <a href="https://javaee.github.io/tutorial/persistence-intro.html">Java Persistence
API</a> (JPA)
to annotate the class and fields so JOOQ will know how to deserialize
query results into an instance of the model.</p>
<p>We'll use <a href="https://projectlombok.org/">Lombok</a> to label the whole
object as <code>Data</code> so that getter and setter methods are
generated automatically for each private field. And we'll use a
<a href="https://github.com/FasterXML/jackson">Jackson</a> annotation to label
the JSON field name of each column.</p>
<p>This is the <code>TodoItem</code> class in
<code>src/main/java/api/model/TodoItem.java</code>:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.model</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.time.OffsetDateTime</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Column</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Id</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Table</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">com.fasterxml.jackson.annotation.JsonFormat</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">com.fasterxml.jackson.annotation.JsonProperty</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">lombok.Data</span><span class="p">;</span>
<span class="nd">@Data</span>
<span class="nd">@Table</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"todo_item"</span><span class="p">)</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TodoItem</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">"id"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@Id</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">id</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"name"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">"name"</span><span class="p">)</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">name</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"created_at"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">"createdAt"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@JsonFormat</span><span class="p">(</span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"yyyy-MM-dd'T'HH:mm:ssZ"</span><span class="p">)</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">OffsetDateTime</span><span class="w"> </span><span class="n">createdAt</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"completed_at"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">"completedAt"</span><span class="p">)</span>
<span class="w"> </span><span class="nd">@JsonFormat</span><span class="p">(</span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">"yyyy-MM-dd'T'HH:mm:ssZ"</span><span class="p">)</span>
<span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">OffsetDateTime</span><span class="w"> </span><span class="n">completedAt</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p class="note">
The JSON format specifications for the timestamp fields aren't
actually working. The formatted JSON returns a giant object and I
haven't figured out how to get it to serialize to the RFC3339 string
yet.
</p><p>We're set! The last step is a simple controller to return a list of
TODO items.</p>
<h3 id="the-/items-controller">The /items controller</h3><p>In the <code>ItemsController</code> class in
<code>src/main/java/api/model/ItemsController.java</code> we'll inject
the <code>Dao</code> object and use it to return a page of TODO items
as JSON.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.controller</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">java.util.List</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.inject.Inject</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Table</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.GET</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.Path</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.Produces</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.core.MediaType</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.DSLContext</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">api.dao.Dao</span><span class="p">;</span>
<span class="kn">import</span><span class="w"> </span><span class="nn">api.model.TodoItem</span><span class="p">;</span>
<span class="nd">@Path</span><span class="p">(</span><span class="s">"items"</span><span class="p">)</span>
<span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">ItemsController</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nd">@Inject</span>
<span class="w"> </span><span class="n">Dao</span><span class="w"> </span><span class="n">dao</span><span class="p">;</span>
<span class="w"> </span><span class="nd">@GET</span>
<span class="w"> </span><span class="nd">@Produces</span><span class="p">(</span><span class="n">MediaType</span><span class="p">.</span><span class="na">APPLICATION_JSON</span><span class="p">)</span>
<span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">List</span><span class="o"><</span><span class="n">TodoItem</span><span class="o">></span><span class="w"> </span><span class="nf">getServers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">DSLContext</span><span class="w"> </span><span class="n">dslCtx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">getDSLContext</span><span class="p">();</span>
<span class="w"> </span><span class="n">Table</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoItem</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getAnnotation</span><span class="p">(</span><span class="n">Table</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">dslCtx</span><span class="p">.</span><span class="na">select</span><span class="p">().</span><span class="na">from</span><span class="p">(</span><span class="n">table</span><span class="p">.</span><span class="na">name</span><span class="p">()).</span><span class="na">fetch</span><span class="p">().</span><span class="na">into</span><span class="p">(</span><span class="n">TodoItem</span><span class="p">.</span><span class="na">class</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p class="note">
There's some more implicit magic here when we return a list of
<code>TodoItem</code>s. Since we marked the endpoint as producing
JSON, and since Jackson is in our class path, Jersey will
automatically use Jackson to serialize the list to JSON.
<br />
<br />
The API is quite nice but I could do without the automatic
class-loading magic.
</p><p>Now we're ready to build, run and test.</p>
<h3 id="building-and-running">Building and running</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>mvn<span class="w"> </span>clean<span class="w"> </span>compile
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Scanning<span class="w"> </span><span class="k">for</span><span class="w"> </span>projects...
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------<<span class="w"> </span>api:api<span class="w"> </span>>-------------------------------
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>api<span class="w"> </span><span class="m">1</span>.0-SNAPSHOT
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>--------------------------------<span class="o">[</span><span class="w"> </span>jar<span class="w"> </span><span class="o">]</span>---------------------------------
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>maven-clean-plugin:2.5:clean<span class="w"> </span><span class="o">(</span>default-clean<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>---
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Deleting<span class="w"> </span>/Users/philipeaton/tmp/test/target
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>maven-resources-plugin:2.6:resources<span class="w"> </span><span class="o">(</span>default-resources<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>---
<span class="o">[</span>WARNING<span class="o">]</span><span class="w"> </span>Using<span class="w"> </span>platform<span class="w"> </span>encoding<span class="w"> </span><span class="o">(</span>UTF-8<span class="w"> </span>actually<span class="o">)</span><span class="w"> </span>to<span class="w"> </span>copy<span class="w"> </span>filtered<span class="w"> </span>resources,<span class="w"> </span>i.e.<span class="w"> </span>build<span class="w"> </span>is<span class="w"> </span>platform<span class="w"> </span>dependent!
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>skip<span class="w"> </span>non<span class="w"> </span>existing<span class="w"> </span>resourceDirectory<span class="w"> </span>/Users/philipeaton/tmp/test/src/main/resources
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>maven-compiler-plugin:3.8.1:compile<span class="w"> </span><span class="o">(</span>default-compile<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>---
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Changes<span class="w"> </span>detected<span class="w"> </span>-<span class="w"> </span>recompiling<span class="w"> </span>the<span class="w"> </span>module!
<span class="o">[</span>WARNING<span class="o">]</span><span class="w"> </span>File<span class="w"> </span>encoding<span class="w"> </span>has<span class="w"> </span>not<span class="w"> </span>been<span class="w"> </span>set,<span class="w"> </span>using<span class="w"> </span>platform<span class="w"> </span>encoding<span class="w"> </span>UTF-8,<span class="w"> </span>i.e.<span class="w"> </span>build<span class="w"> </span>is<span class="w"> </span>platform<span class="w"> </span>dependent!
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Compiling<span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>files<span class="w"> </span>to<span class="w"> </span>/Users/philipeaton/tmp/test/target/classes
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------------------------------------------------
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>BUILD<span class="w"> </span>SUCCESS
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------------------------------------------------
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Total<span class="w"> </span>time:<span class="w"> </span><span class="m">2</span>.198<span class="w"> </span>s
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Finished<span class="w"> </span>at:<span class="w"> </span><span class="m">2020</span>-02-01T17:07:14-05:00
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------------------------------------------------
$<span class="w"> </span>mvn<span class="w"> </span>exec:java
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Scanning<span class="w"> </span><span class="k">for</span><span class="w"> </span>projects...
<span class="o">[</span>INFO<span class="o">]</span>
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------<<span class="w"> </span>api:api<span class="w"> </span>>-------------------------------
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>api<span class="w"> </span><span class="m">1</span>.0-SNAPSHOT
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>--------------------------------<span class="o">[</span><span class="w"> </span>jar<span class="w"> </span><span class="o">]</span>---------------------------------
<span class="o">[</span>INFO<span class="o">]</span>
<span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>exec-maven-plugin:1.6.0:java<span class="w"> </span><span class="o">(</span>default-cli<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>---
<span class="m">17</span>:06:53.793<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.util.log<span class="w"> </span>-<span class="w"> </span>Logging<span class="w"> </span>initialized<span class="w"> </span>@2017ms<span class="w"> </span>to<span class="w"> </span>org.eclipse.jetty.util.log.Slf4jLog
<span class="m">17</span>:06:54.378<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.server.Server<span class="w"> </span>-<span class="w"> </span>jetty-9.4.17.v20190418<span class="p">;</span><span class="w"> </span>built:<span class="w"> </span><span class="m">2019</span>-04-18T19:45:35.259Z<span class="p">;</span><span class="w"> </span>git:<span class="w"> </span>aa1c656c315c011c01e7b21aabb04066635b9f67<span class="p">;</span><span class="w"> </span>jvm<span class="w"> </span><span class="m">13</span>+33
<span class="m">17</span>:06:54.425<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.server.AbstractConnector<span class="w"> </span>-<span class="w"> </span>Started<span class="w"> </span>ServerConnector@3943a159<span class="o">{</span>HTTP/1.1,<span class="o">[</span>http/1.1<span class="o">]}{</span><span class="m">0</span>.0.0.0:7780<span class="o">}</span>
<span class="m">17</span>:06:54.425<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.server.Server<span class="w"> </span>-<span class="w"> </span>Started<span class="w"> </span>@2651ms
<span class="m">17</span>:06:54.425<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>api.app.Application<span class="w"> </span>-<span class="w"> </span>Started<span class="w"> </span>listening<span class="w"> </span>on<span class="w"> </span>http://localhost:7780
</pre></div>
<p>In a new terminal curl the endpoint:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:7780/items<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq
<span class="o">[</span>
<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"name"</span>:<span class="w"> </span>null,
<span class="w"> </span><span class="s2">"createdAt"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"offset"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"totalSeconds"</span>:<span class="w"> </span>-18000,
<span class="w"> </span><span class="s2">"id"</span>:<span class="w"> </span><span class="s2">"-05:00"</span>,
<span class="w"> </span><span class="s2">"rules"</span>:<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="s2">"transitions"</span>:<span class="w"> </span><span class="o">[]</span>,
<span class="w"> </span><span class="s2">"transitionRules"</span>:<span class="w"> </span><span class="o">[]</span>,
<span class="w"> </span><span class="s2">"fixedOffset"</span>:<span class="w"> </span><span class="nb">true</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"dayOfWeek"</span>:<span class="w"> </span><span class="s2">"SATURDAY"</span>,
<span class="w"> </span><span class="s2">"dayOfYear"</span>:<span class="w"> </span><span class="m">32</span>,
<span class="w"> </span><span class="s2">"nano"</span>:<span class="w"> </span><span class="m">594440000</span>,
<span class="w"> </span><span class="s2">"year"</span>:<span class="w"> </span><span class="m">2020</span>,
<span class="w"> </span><span class="s2">"monthValue"</span>:<span class="w"> </span><span class="m">2</span>,
<span class="w"> </span><span class="s2">"dayOfMonth"</span>:<span class="w"> </span><span class="m">1</span>,
<span class="w"> </span><span class="s2">"hour"</span>:<span class="w"> </span><span class="m">17</span>,
<span class="w"> </span><span class="s2">"minute"</span>:<span class="w"> </span><span class="m">8</span>,
<span class="w"> </span><span class="s2">"second"</span>:<span class="w"> </span><span class="m">0</span>,
<span class="w"> </span><span class="s2">"month"</span>:<span class="w"> </span><span class="s2">"FEBRUARY"</span>
<span class="w"> </span><span class="o">}</span>,
<span class="w"> </span><span class="s2">"completedAt"</span>:<span class="w"> </span>null
<span class="w"> </span><span class="o">}</span>
<span class="o">]</span>
</pre></div>
<p>And we're done!</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I really enjoy using Java for REST APIs, avoiding Spring and Play. Use simple but mature libraries that are no more difficult to cobble together than everything you must do in Go or Flask for a REST API. vs Go you get generics and vs python you get safety<a href="https://t.co/twmjZprow6">https://t.co/twmjZprow6</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1223733417453465601?ref_src=twsrc%5Etfw">February 1, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/a-minimal-rest-api-in-java.htmlSat, 01 Feb 2020 00:00:00 +0000
- Writing a lisp compiler from scratch in JavaScript: 6. an x86 upgradehttp://notes.eatonphil.com/compiler-basics-an-x86-upgrade.html<p class="note">
Previously in compiler basics:
<! forgive me, for I have sinned >
<br />
<a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a>
<br />
<a href="/compiler-basics-functions.html">2. user-defined functions and variables</a>
<br />
<a href="/compiler-basics-llvm.html">3. LLVM</a>
<br />
<a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a>
<br />
<a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a>
</p><p>This post upgrades the ulisp x86 backend from using a limited set of
registers (with no spilling support) to solely using the stack to pass
values between expressions.</p>
<p>This is a slightly longer post since we've got a lot of catchup to do
to get to feature parity with the LLVM backend. Namely:</p>
<ul>
<li>"Infinite" locals, parameters</li>
<li>Function definitions</li>
<li>Variable references</li>
<li>Arithmetic and logical operations</li>
<li>If</li>
<li>Syscalls</li>
</ul>
<p>We'll tackle the first four points first and finish up with the last
two. This way we can support the same fibonacci program that prints
integers to stdout that we support in the LLVM backend.</p>
<p>As always the <a href="https://github.com/eatonphil/ulisp">code is available on
Github</a>.</p>
<p>But first a digression into how this is suddenly easy for us to do
with x86 and one-pass (sorta) code generation.</p>
<h3 id="stack-based-languages">Stack-based languages</h3><p>Stack-based languages have the extremely convenient attribute that
values are (by default) stored on the stack, which allows a code
generator targeting a stack-based language the option to omit handling
register allocation. And as it happens, x86 has enough support to make
it easy to treat as a stack machine.</p>
<p>As we build out the code generator for x86 as a stack machine we need
to keep two commitments in mind:</p>
<ul>
<li>Every expression must pop all arguments/operands</li>
<li>Every expression must store one result back on the stack</li>
</ul>
<p>In the future, we may replace the second commitment. But for now it is
more than enough.</p>
<h3 id="boilerplate">Boilerplate</h3><p>We'll start with the existing x86 backend code and strip all the
implementation code:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'child_process'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">os</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'os'</span><span class="p">);</span>
<span class="kd">let</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALL_MAP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">darwin</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">exit</span><span class="o">:</span><span class="w"> </span><span class="s1">'0x2000001'</span><span class="p">,</span>
<span class="w"> </span><span class="nx">write</span><span class="o">:</span><span class="w"> </span><span class="s1">'0x2000004'</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">linux</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">exit</span><span class="o">:</span><span class="w"> </span><span class="mf">60</span><span class="p">,</span>
<span class="w"> </span><span class="nx">write</span><span class="o">:</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">}[</span><span class="nx">os</span><span class="p">.</span><span class="nx">platform</span><span class="p">()];</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="k">if</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">prepareArithmeticWrappers</span><span class="p">(),</span>
<span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">prepareLogicalWrappers</span><span class="p">(),</span>
<span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">prepareSyscallWrappers</span><span class="p">(),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">prepareArithmeticWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">prepareLogicalWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">prepareSyscallWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">undefined</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Invalid call to emit'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">' '</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">indent</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">args</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">topLevel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">emitPrefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.global _main\n'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.text\n'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">emitPostfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'_main:'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'CALL main'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'MOV RDI, RAX'</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set exit arg</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="s1">'exit'</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'SYSCALL'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">getOutput</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">'\n'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Leave at most one empty line</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">output</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/\n\n\n+/g</span><span class="p">,</span><span class="w"> </span><span class="s1">'\n\n'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Compiler</span><span class="p">();</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">emitPrefix</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">);</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">emitPostfix</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">getOutput</span><span class="p">();</span>
<span class="p">};</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">build</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">buildDir</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'prog'</span><span class="p">;</span>
<span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="sb">`</span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s`</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span>
<span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span>
<span class="w"> </span><span class="sb">`gcc -mstackrealign -masm=intel -o </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s`</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="p">};</span>
</pre></div>
<p>The prefix and postfix stays mostly the same as the original
implementation. But we'll assume a couple of new helpers to get us in
parity with the LLVM backend:</p>
<ul>
<li><code>compileDefine</code></li>
<li><code>compileBegin</code></li>
<li><code>compileIf</code></li>
<li><code>compileCall</code></li>
<li><code>prepareArithmeticWrappers</code></li>
<li><code>prepareLogicalWrappers</code></li>
<li><code>prepareSyscallWrappers</code></li>
</ul>
<p>The <code>prepareArithmeticWrappers</code> helper will define wrappers
for arithmetic instructions. The <code>prepareLogicalWrappers</code>
helper will define wrappers for logical instructions. And the
<code>prepareSyscallWrappers</code> helper will define a wrapper for
syscalls and generate builtins based on the SYSCALL_MAP entries.</p>
<h3 id="scope">Scope</h3><p>Similar to our LLVM backend's Context and Scope helpers we'll define
our own for the x86 backend. Since we're placing all locals on the
stack, the two most important things Scope will do for us are:</p>
<ul>
<li>Map identifiers to escaped strings</li>
<li>Store and increment the location of the local on the stack</li>
</ul>
<p>Here's what it will look like:</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">localOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">safe</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">localOffset</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">safe</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">symbol</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">localOffset</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">lookup</span><span class="p">(</span><span class="nx">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">safe</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="nx">safe</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">safe</span><span class="p">]</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">copy</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// In the future we may need to store s.scopeOffset = this.scopeOffset + 1</span>
<span class="w"> </span><span class="c1">// so we can read outer-scoped values at runtime.</span>
<span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="compileexpression">compileExpression</h3><p>An expression will be one of:</p>
<ul>
<li>A function call (possibly a builtin like <code>def</code> or <code>+</code>)</li>
<li>A literal value (e.g. <code>29</code>)</li>
<li>A reference (e.g. <code>&c</code>)</li>
<li>An identifier (e.g. <code>my-var</code>)</li>
</ul>
<p>We'll handle compiling an expression in that order. If the AST
argument passed to <code>compileExpression</code> is an array, we will
call <code>compileCall</code> and return.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Is a nested function call, compile it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>If the AST is a number, we will push the number onto the stack and
return.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Is a nested function call, compile it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>If the AST is a string that starts with <code>&</code> we will look up
the location of the identifier after the <code>&</code>, push its
<em>location</em> onto the stack and return.</p>
<p>We count on the Scope storing its offset from the "frame pointer",
which we will later set up to be stored in <code>RBP</code>.</p>
<p>Locals will be stored after the frame pointer and parameters will be
stored before it. So we'll need to add or subtract from the frame
pointer depending on if we need a positive or negative offset from it.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Is a nested function call, compile it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'&'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mf">1</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Copy the frame pointer so we can return an offset from it</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, RBP`</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">'ADD'</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">'SUB'</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> RAX, </span><span class="si">${</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="p">)</span><span class="si">}</span><span class="sb"> # </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Finally, we'll look up the identifier and copy the value (in its
offset on the stack) to the top of the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Is a nested function call, compile it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'&'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mf">1</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Copy the frame pointer so we can return an offset from it</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, RBP`</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">'ADD'</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">'SUB'</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> RAX, </span><span class="si">${</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="p">)</span><span class="si">}</span><span class="sb"> # </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Variable lookup</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">arg</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">offset</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">'+'</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">'-'</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span>
<span class="w"> </span><span class="nx">depth</span><span class="p">,</span>
<span class="w"> </span><span class="sb">`PUSH [RBP </span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="p">)</span><span class="si">}</span><span class="sb">] # </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span>
<span class="w"> </span><span class="s1">'Attempt to reference undefined variable or unsupported literal: '</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="nx">arg</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And that's it for handling expression! Let's add
<code>compileCall</code> support now that we referenced it.</p>
<h3 id="compilecall">compileCall</h3><p>A call will first check if the call is a builtin. If so, it will
immediately pass control to the builtin.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Otherwise it will compile every argument to the call (which will leave
all the resulting values on the stack.)</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Compile registers and store on the stack</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then we will check that function is defined and call it.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Compile registers and store on the stack</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">fn</span><span class="p">.</span><span class="nx">name</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Attempt to call undefined function: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then we'll reset the stack pointer (to maintain our commitment) based
on the number of arguments and push <code>RAX</code> (where the return
result of the function call will be stored) onto the stack. We'll make
two minor optimizations for when there is exactly zero or one argument
to the function.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Compile registers and store on the stack</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">fn</span><span class="p">.</span><span class="nx">name</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Attempt to call undefined function: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Drop the args</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`ADD RSP, </span><span class="si">${</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], RAX\n`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">'PUSH RAX\n'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>When there is only one argument, we can just set the top value on the
stack to be the return result of the call rather than resetting the
stack pointer just to push onto it.</p>
<p>And that's it for <code>compileCall</code>! Now that we've got a feel
for expressions and function calls, let's add some simple arithmetic
operations.</p>
<h3 id="preparearithmeticwrappers">prepareArithmeticWrappers</h3><p>There are two major kind of arithmetic instructions we'll wrap for now:</p>
<ul>
<li>"General" instructions that operate on two arguments, putting the
return result in the first argument</li>
<li>"RAX" instructions that operate on RAX and the first argument,
putting the return result in <code>RAX</code> and possibly
<code>RDX</code></li>
</ul>
<h4 id="preparegeneral">prepareGeneral</h4><p>This helper will compile its two arguments and pop the second argument
into <code>RAX</code>. This is because x86 instructions typically
require one argument to be a register if one argument is allowed to be
a memory address.</p>
<p>We'll use the stack address as the first argument so 1) that
non-commutative operations are correct and 2) the result is stored
right back onto the stack in the right location.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile first argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile second argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile operation</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> [RSP], RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# End </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<h4 id="preparerax">prepareRax</h4><p>This helper will similarly compile its two arguments and pop
the second argument into <code>RAX</code>. But the RAX-implicit
instructions require the argument to be stored in a register
so we'll use the <code>XCHG</code> instruction to swap <code>RAX</code>
with the value on the top of the stack (the first argument).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareRax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">,</span><span class="w"> </span><span class="nx">outRegister</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">arg</span><span class="p">,</span>
<span class="w"> </span><span class="nx">scope</span><span class="p">,</span>
<span class="w"> </span><span class="nx">depth</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile second argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// POP second argument and swap with first</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XCHG [RSP], RAX`</span><span class="p">);</span>
</pre></div>
<p>This may seem roundabout but remember that we <em>must</em> pop all
arguments to the instruction to maintain our commitment.</p>
<p>Next we'll zero out the <code>RDX</code> register if the operation is
<code>DIV</code>, perform the operation, and store the result on the
top of the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareRax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">,</span><span class="w"> </span><span class="nx">outRegister</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">arg</span><span class="p">,</span>
<span class="w"> </span><span class="nx">scope</span><span class="p">,</span>
<span class="w"> </span><span class="nx">depth</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile second argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// POP second argument and swap with first</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XCHG [RSP], RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Reset RDX for DIV</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'DIV'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XOR RDX, RDX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Compiler operation</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> QWORD PTR [RSP]`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Swap the top of the stack</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], </span><span class="si">${</span><span class="nx">outRegister</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">};</span>
</pre></div>
<p>We parameterize the out register because the <code>%</code> wrapper
will call <code>DIV</code> but need <code>RDX</code> rather than
<code>RAX</code> after the operation.</p>
<h4 id="preparearithmeticwrappers">prepareArithmeticWrappers</h4><p>Putting everything together we get:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prepareArithmeticWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// General operatations</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile first argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile second argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile operation</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> [RSP], RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# End </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="c1">// Operations that use RAX implicitly</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareRax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">,</span><span class="w"> </span><span class="nx">outRegister</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">arg</span><span class="p">,</span>
<span class="w"> </span><span class="nx">scope</span><span class="p">,</span>
<span class="w"> </span><span class="nx">depth</span><span class="p">,</span>
<span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile second argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// POP second argument and swap with first</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XCHG [RSP], RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Reset RDX for DIV</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'DIV'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XOR RDX, RDX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Compiler operation</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> QWORD PTR [RSP]`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Swap the top of the stack</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], </span><span class="si">${</span><span class="nx">outRegister</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'&'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">'and'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'|'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">'or'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'='</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">'mov'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'*'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareRax</span><span class="p">(</span><span class="s1">'mul'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'/'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareRax</span><span class="p">(</span><span class="s1">'div'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'%'</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareRax</span><span class="p">(</span><span class="s1">'div'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RDX'</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next we'll tackle <code>compileBegin</code> and
<code>compileDefine</code>.</p>
<h3 id="compilebegin">compileBegin</h3><p>A begin form is an expression made up of a series of expressions where
all expression values are thrown away and the last expression value is
the result of the begin form.</p>
<p>To compile this form we will compile each expression passed in and pop
from the stack to throw its value away. If the expression is the
last in the list we will not pop since it is the result of the begin
form.</p>
<p>We will add one exception to this popping logic: if the begin is
called from the top-level we will omit the popping.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">topLevel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">topLevel</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX # Ignore non-final expression`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>That's it for <code>compileBegin</code>!</p>
<h3 id="compiledefine">compileDefine</h3><p>The prelude for a function definition will add its name to scope, push
the current frame pointer (<code>RBP</code>) onto the stack and store
the current stack pointer (<code>RSP</code>) as the new frame pointer
(<code>RBP</code>).</p>
<p>Remember that we use the frame pointer as a point of reference when
setting and getting local and parameter values. It works out entirely
by convention.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Add this function to outer scope</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren't exposed in outer scope.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">safe</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RBP`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RBP, RSP\n`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next we copy the parameters into local scope at their negative (from
the frame pointer) location. In the future we may decide to actually
copy in the parameter <em>values</em> into the local stack but for now
there's no benefit.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Add this function to outer scope</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren't exposed in outer scope.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">safe</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RBP`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RBP, RSP\n`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy params into local scope</span>
<span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="nx">params</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Next we'll compile the body of the function within a
<code>begin</code> block.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Add this function to outer scope</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren't exposed in outer scope.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">safe</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RBP`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RBP, RSP\n`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy params into local scope</span>
<span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="nx">params</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then in the postlude we'll pop the stack (for the return result of the
begin form), save it in RAX, pop the previous frame pointer back to
restore the calling frame, and return.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="n">compileDefine</span><span class="p">(</span><span class="o">[</span><span class="n">name, params, ...body</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="k">scope</span><span class="p">,</span><span class="w"> </span><span class="k">depth</span><span class="p">)</span><span class="w"> </span><span class="err">{</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">Add</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="k">outer</span><span class="w"> </span><span class="k">scope</span>
<span class="w"> </span><span class="n">const</span><span class="w"> </span><span class="n">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">scope</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Copy</span><span class="w"> </span><span class="k">outer</span><span class="w"> </span><span class="k">scope</span><span class="w"> </span><span class="n">so</span><span class="w"> </span><span class="k">parameter</span><span class="w"> </span><span class="n">mappings</span><span class="w"> </span><span class="n">aren</span><span class="s1">'t exposed in outer scope.</span>
<span class="s1"> const childScope = scope.copy();</span>
<span class="s1"> this.emit(0, `${safe}:`);</span>
<span class="s1"> this.emit(depth, `PUSH RBP`);</span>
<span class="s1"> this.emit(depth, `MOV RBP, RSP\n`);</span>
<span class="s1"> // Copy params into local scope</span>
<span class="s1"> params.forEach((param, i) => {</span>
<span class="s1"> childScope.map[param] = -1 * (params.length - i - 1 + 2);</span>
<span class="s1"> });</span>
<span class="s1"> // Pass childScope in for reference when body is compiled.</span>
<span class="s1"> this.compileBegin(body, childScope, depth);</span>
<span class="s1"> // Save the return value</span>
<span class="s1"> this.emit(0, '');</span>
<span class="s1"> this.emit(depth, `POP RAX`);</span>
<span class="s1"> this.emit(depth, `POP RBP\n`);</span>
<span class="s1"> this.emit(depth, '</span><span class="n">RET</span><span class="err">\</span><span class="n">n</span><span class="err">'</span><span class="p">);</span>
<span class="w"> </span><span class="err">}</span>
</pre></div>
<p>And now we're ready to compile a simple program!</p>
<h3 id="our-first-program">Our first program</h3><p>Here's a simple one we can support:</p>
<div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">tests/meaning-of-life.lisp</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="p">(</span><span class="nb">*</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">17</span><span class="p">)))</span>
</pre></div>
<p>We'll compile this program without the ulisp kernel (which contains a
lisp library we cannot currently compile):</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/meaning-of-life.lisp<span class="w"> </span>--no-kernel<span class="w"> </span>--backend<span class="w"> </span>x86
$<span class="w"> </span>./build/prog
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">42</span>
</pre></div>
<p>Not bad. Let's finish up with support for
<code>prepareLogicalWrappers</code>,
<code>prepareSyscallWrappers</code>, and <code>compileIf</code>.</p>
<h3 id="preparelogicalwrappers">prepareLogicalWrappers</h3><p>Storing logical results as values is a bit of pain. Most of the
internet wants you to use branching. And a better compiler may
optimize an idiom like <code>(if (> 5 2) ...)</code> into a single
branch.</p>
<p>But we're going to resort to an instruction I just learned about
called <code>CMOV</code>. This allows us to conditionally assign a
value based on flags, similar to how you can conditionally branch.</p>
<p>Otherwise we'll follow a pattern similar to our arithmetic
wrappers. At the end of the procedure we will have a 0 or a 1 on the
top of the stack.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prepareLogicalWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareComparison</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">operator</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">[</span><span class="nx">operator</span><span class="p">]</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">operator</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile second argument</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile operation</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`CMP [RSP], RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Reset RAX to serve as CMOV* dest, MOV to keep flags (vs. XOR)</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, 0`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Conditional set [RSP]</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s1">'>'</span><span class="o">:</span><span class="w"> </span><span class="s1">'CMOVA'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'>='</span><span class="o">:</span><span class="w"> </span><span class="s1">'CMOVAE'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'<'</span><span class="o">:</span><span class="w"> </span><span class="s1">'CMOVB'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'<='</span><span class="o">:</span><span class="w"> </span><span class="s1">'CMOVBE'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'=='</span><span class="o">:</span><span class="w"> </span><span class="s1">'CMOVE'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'!='</span><span class="o">:</span><span class="w"> </span><span class="s1">'CMOVNE'</span><span class="p">,</span>
<span class="w"> </span><span class="p">}[</span><span class="nx">operator</span><span class="p">];</span>
<span class="w"> </span><span class="c1">// CMOV* requires the source to be memory or register</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV DWORD PTR [RSP], 1`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// CMOV* requires the dest to be a register</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> RAX, [RSP]`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# End </span><span class="si">${</span><span class="nx">operator</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">'>'</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">'>='</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">'<'</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">'<='</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">'=='</span><span class="p">),</span>
<span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">'!='</span><span class="p">),</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h3 id="preparesyscallwrappers">prepareSyscallWrappers</h3><p>This helper is similar to <code>compileCall</code> except for that it
needs to follow the SYS V ABI and use the <code>SYSCALL</code>
instruction rather than <code>CALL</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prepareSyscallWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">registers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s1">'RDI'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RSI'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RDX'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R10'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R8'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R9'</span><span class="p">];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">wrappers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="nx">SYSCALL_MAP</span><span class="p">).</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">obj</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">wrappers</span><span class="p">[</span><span class="sb">`syscall/</span><span class="si">${</span><span class="nx">key</span><span class="si">}</span><span class="sb">`</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="sb">`Too many arguments to syscall/</span><span class="si">${</span><span class="nx">key</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Compile first</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">arg</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Then pop to avoid possible register contention</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">registers</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">),</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">'SYSCALL'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">wrappers</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And we're set! Last up is <code>compileIf</code>.</p>
<h3 id="compileif">compileIf</h3><p>This is standard code generation but gets a little tricky due to our
stack commitments. Testing must pop the test value off the stack. And
then/else blocks must <em>push</em> a value onto the stack (even if
there is no else block).</p>
<p>Here is an example we'd like to support:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nv">foo</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">do-bar</span><span class="p">))</span>
</pre></div>
<p>We compile the test and branch:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">'# If'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile test</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">branch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`else_branch`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Must pop/use up argument in test</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`TEST RAX, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JZ .</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Then we compile the then block and jump to after the else block
afterward.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">'# If'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile test</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">branch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`else_branch`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Must pop/use up argument in test</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`TEST RAX, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JZ .</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile then section</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# If then`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JMP .after_</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Finally we compile the else block if it exists, and otherwise we push
a zero onto the stack (possibly to represent null).</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">'# If'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile test</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">branch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`else_branch`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Must pop/use up argument in test</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`TEST RAX, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JZ .</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile then section</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# If then`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JMP .after_</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile else section</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# If else`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`.</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">els</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">els</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'PUSH 0 # Null else branch'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`.after_</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">'# End if'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And we're ready for an interesting program! Let's print (to stdout)
the result of <code>fib(20)</code>.</p>
<h3 id="fibonacci">Fibonacci</h3><div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="o">.</span><span class="nv">/tests/fib.lisp</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nv">n</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb"><</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
<span class="w"> </span><span class="nv">n</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">2</span><span class="p">)))))</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="mi">20</span><span class="p">)))</span>
</pre></div>
<p>And check out the kernel:</p>
<div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="o">.</span><span class="nv">/lib/kernel.lisp</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nv">c</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">syscall/write</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">&c</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nv">n</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">></span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">9</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nb">/</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">48</span><span class="w"> </span><span class="p">(</span><span class="nv">%</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">))))</span>
</pre></div>
<p>Compile and run it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp<span class="w"> </span>--backend<span class="w"> </span>x86
$<span class="w"> </span>./build/prog
<span class="m">6765</span>
</pre></div>
<p>And we're in business!</p>
<p><blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">Latest post in the compiler basics series: an x86 upgrade. We've got basic syscall support, "infinite" locals and parameters, and if/else. More than enough to handle printing integers to stdout and recursive fibonacci. <a href="https://t.co/B3OV0vEX1V">https://t.co/B3OV0vEX1V</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1203816831456284677?ref_src=twsrc%5Etfw">December 8, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/compiler-basics-an-x86-upgrade.htmlSun, 08 Dec 2019 00:00:00 +0000
- Confusion and disengagement in meetingshttp://notes.eatonphil.com/confusion-disengagement-in-meetings.html<p>The quickest way to cut through confusion or disagreement among
otherwise amiable and honest folks is to ask questions.</p>
<p>Ask early so you don't waste time. But it's not enough to just ask
clarifying questions because the <strong>answers</strong> won't always be clear.</p>
<p>Sounds like Human Interaction 101, and maybe it is. These techniques
show up more when discussing <strong>outcomes</strong> and very rarely when
discussing <strong>assumptions</strong>.</p>
<p>Meetings are called to discuss outcomes, not assumptions. But
assumptions almost always need to be called into question too.</p>
<p>If you have clarity personally but you observe confusion and
disengagement, <strong>ask questions and summarize</strong>. Someone must be aware
of the group and be willing to sound dumb.</p>
<p>If you aren't aware of confusion or disengagement, start paying
attention. Addressing doesn't need to be hard and is personally
meaningful.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Address confusion and disengagement in meetings by asking questions and summarizing, whether you're confused or not. Question outcomes _and_ assumptions. <a href="https://t.co/2OPifEBSq5">https://t.co/2OPifEBSq5</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1200972237756674049?ref_src=twsrc%5Etfw">December 1, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/confusion-disengagement-in-meetings.htmlSat, 30 Nov 2019 00:00:00 +0000
- Interpreting Gohttp://notes.eatonphil.com/interpreting-go.html<p>After spending some time at work on tooling for keeping documentation
in sync with Go struct definitions I had enough exposure to Go's
built-in parsing package that next steps were clear: write an
interpreter. <a href="http://notes.eatonphil.com/interpreting-typescript.html">It's a great way to get more comfortable with a
language's
AST.</a></p>
<p>In this post we'll use the Go parser package to interpret the AST
directly (as opposed to compiling to a bytecode VM) with enough to
support a recursive implementation of the fibonacci algorithm:</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">a</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">a</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">println</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mi">15</span><span class="p">))</span>
<span class="p">}</span>
</pre></div>
<p class="note">
You'll note this isn't actually valid Go because we are using an
undefined function <code>println</code>. We'll provide that for the
runtime to make things easier on ourselves.
</p><p>The fibonacci algorithm is my goto minimal program that forces us to
deal with basic aspects of:</p>
<ul>
<li>Function definitions</li>
<li>Function calls</li>
<li>Function arguments</li>
<li>Function return values</li>
<li>If/else</li>
<li>Assignment</li>
<li>Arithmetic and boolean operators</li>
</ul>
<p>We'll do this in around 200 LoC. Project code is available on
<a href="https://github.com/eatonphil/goi">Github</a>.</p>
<p>A follow-up post will cover support for an iterative fibonacci
implementation with support for basic aspects of:</p>
<ul>
<li>Local variables</li>
<li>Loops</li>
</ul>
<h3 id="first-steps">First steps</h3><p>I always start exploring an AST by practicing error-driven
development. It's helpful to have the Go
<a href="https://golang.org/pkg/go/ast/">AST</a>,
<a href="https://golang.org/pkg/go/parser/">parser</a>, and
<a href="https://golang.org/pkg/go/token/">token</a> package docs handy as well.</p>
<p>We'll focus on single-file programs and start with
<a href="https://golang.org/pkg/go/parser/#ParseFile">parser.ParseFile</a>. This
function will return an
<a href="https://golang.org/pkg/go/ast/#File">*ast.File</a>. This in turn
contains a list of
<a href="https://golang.org/pkg/go/ast/#Decl">Decl</a>s. Unfortunately Go stops
being helpful at this point because we have no clue what is going to
implement this <code>Decl</code> interface. So we'll switch on the
concrete type and error until we know what we need to know.</p>
<div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span>
<span class="kn">import</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s">"go/ast"</span>
<span class="w"> </span><span class="s">"go/parser"</span>
<span class="w"> </span><span class="s">"go/token"</span>
<span class="w"> </span><span class="s">"io/ioutil"</span>
<span class="w"> </span><span class="s">"log"</span>
<span class="w"> </span><span class="s">"os"</span>
<span class="w"> </span><span class="s">"reflect"</span>
<span class="p">)</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">decl</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Decls</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decl</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown decl type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">d</span><span class="p">),</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">NewFileSet</span><span class="p">()</span><span class="w"> </span><span class="c1">// positions are relative to fset</span>
<span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unable to read file: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">ParseFile</span><span class="p">(</span><span class="nx">fset</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unable to parse file: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p>Build and run:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">09</span>:43:48<span class="w"> </span>Unknown<span class="w"> </span>decl<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.FuncDecl<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>Doc:<nil><span class="w"> </span>Recv:<nil><span class="w"> </span>Name:fib<span class="w"> </span>Type:0xc000096320<span class="w"> </span>Body:0xc00009a3c0<span class="o">}</span>
</pre></div>
<p>Cool! This is the declaration of the <code>fib</code> function and its
type is <a href="https://golang.org/pkg/go/ast/#FuncDecl">*ast.FuncDecl</a>.</p>
<h3 id="interpreting-ast.funcdecl">Interpreting ast.FuncDecl</h3><p>A function declaration is going to need to add its name to a context
map, mapped to a function reference for use in function calls. Since
Go throws everything into the same context namespace this we can
simply pass around a map of strings to <code>value</code>s where a
<code>value</code> can be any Go value. To facilitate this, we'll
define a <code>value</code> struct to hold an integer to represent
"kind" and an empty interface "value". When a value is referenced it
will have to switch on the "kind" and then cast the "value".</p>
<p>Additionally, and unlike a value-oriented language like Scheme, we'll
need to track a set of <code>return</code> values at all stages
through interpretation so, when set, we can short circuit execution.</p>
<div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="kt">uint</span>
<span class="kd">const</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="nx">i64</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span>
<span class="w"> </span><span class="nx">fn</span>
<span class="w"> </span><span class="nx">bl</span>
<span class="p">)</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">kind</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kd">interface</span><span class="p">{}</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">context</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="nx">value</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="nb">copy</span><span class="p">()</span><span class="w"> </span><span class="nx">context</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cpy</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">cpy</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">value</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cpy</span>
<span class="p">}</span>
<span class="kd">type</span><span class="w"> </span><span class="nx">ret</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">set</span><span class="w"> </span><span class="kt">bool</span>
<span class="w"> </span><span class="nx">vs</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">)</span><span class="w"> </span><span class="nx">setValues</span><span class="p">(</span><span class="nx">vs</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">vs</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">vs</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">set</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretFuncDecl</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">FuncDecl</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ctx</span><span class="p">[</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Name</span><span class="p">.</span><span class="nx">String</span><span class="p">()]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">value</span><span class="p">{</span>
<span class="w"> </span><span class="nx">fn</span><span class="p">,</span>
<span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">decl</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Decls</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decl</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">FuncDecl</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretFuncDecl</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown decl type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">d</span><span class="p">),</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Now that we have the idea of return management and contexts set out,
let's fill out the actual function declaration callback. Inside we'll
need to copy the context so variables declared inside the function
are not visible outside. Then we'll iterate over the parameters and
map them in context to the associated argument. Finally we'll
interpret the body.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BlockStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretFuncDecl</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">FuncDecl</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">ctx</span><span class="p">[</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Name</span><span class="p">.</span><span class="nx">String</span><span class="p">()]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">value</span><span class="p">{</span>
<span class="w"> </span><span class="nx">fn</span><span class="p">,</span>
<span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">childCtx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ctx</span><span class="p">.</span><span class="nb">copy</span><span class="p">()</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">param</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">Type</span><span class="p">.</span><span class="nx">Params</span><span class="p">.</span><span class="nx">List</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">childCtx</span><span class="p">[</span><span class="nx">param</span><span class="p">.</span><span class="nx">Names</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">String</span><span class="p">()]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">args</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">childCtx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And we'll add a call to the interpreted <code>main</code> to the end
of the interpreter's <code>main</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">NewFileSet</span><span class="p">()</span><span class="w"> </span><span class="c1">// positions are relative to fset</span>
<span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unable to read file: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">ParseFile</span><span class="p">(</span><span class="nx">fset</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unable to parse file: %s"</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">{}</span>
<span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">f</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="nx">ret</span>
<span class="w"> </span><span class="nx">ctx</span><span class="p">[</span><span class="s">"main"</span><span class="p">].</span><span class="nx">value</span><span class="p">.(</span><span class="kd">func</span><span class="p">(</span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">))(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">{})</span>
<span class="p">}</span>
</pre></div>
<p>Next step!</p>
<h3 id="interpreting-ast.blockstmt">Interpreting ast.BlockStmt</h3><p>For this AST node, we'll iterate over each statement and interpret
it. If the return value has been set we'll execute the loop to
short circuit execution.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">bs</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BlockStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">bs</span><span class="p">.</span><span class="nx">List</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="p">)</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">set</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Next step!</p>
<h3 id="interpreting-ast.stmt">Interpreting ast.Stmt</h3><p>Since <a href="https://golang.org/pkg/go/ast/#Stmt">ast.Stmt</a> is another
interface, we're back to error-driven development.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown stmt type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And the trigger:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:15:14<span class="w"> </span>Unknown<span class="w"> </span>stmt<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.ExprStmt<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>X:0xc0000a02c0<span class="o">}</span>
</pre></div>
<p>Great! Checking the docs on
<a href="https://golang.org/pkg/go/ast/#ExprStmt">ast.ExprStmt</a> we'll just
skip directly to a call to a new function <code>interpretExpr</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">ExprStmt</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">X</span><span class="p">)</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown stmt type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Moving on!</p>
<h3 id="interpreting-ast.expr">Interpreting ast.Expr</h3><p>We've got another
<a href="https://golang.org/pkg/go/ast/#Expr">interface</a>. Let's error!</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown expr type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">e</span><span class="p">),</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And the trigger:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:19:16<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.CallExpr<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>Fun:println<span class="w"> </span>Lparen:146<span class="w"> </span>Args:<span class="o">[</span>0xc0000a2280<span class="o">]</span><span class="w"> </span>Ellipsis:0<span class="w"> </span>Rparen:154<span class="o">}</span>
</pre></div>
<p>Cool! For a call we'll evaluate the arguments, evaluate the function
expression itself, cast the resulting value to a function, and call
it.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretCallExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">ce</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">CallExpr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">fnr</span><span class="w"> </span><span class="nx">ret</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">fnr</span><span class="p">,</span><span class="w"> </span><span class="nx">ce</span><span class="p">.</span><span class="nx">Fun</span><span class="p">)</span>
<span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fnr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ce</span><span class="p">.</span><span class="nx">Args</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">vr</span><span class="w"> </span><span class="nx">ret</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">vr</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="p">)</span>
<span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">values</span><span class="p">,</span><span class="w"> </span><span class="nx">vr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kd">func</span><span class="p">(</span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">))(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">values</span><span class="p">)</span>
<span class="p">}</span>
</pre></div>
<p class="note">
All of this casting is unsafe because we aren't doing a
type-checking stage. But we can ignore this because if a
type-checking stage were introduced (which it need to be at some
point), it would prevent bad casts from happening.
</p><h3 id="handling-more-ast.expr-implementations">Handling more ast.Expr implementations</h3><p>Let's give the interpreter a shot again:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:28:17<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.Ident<span class="o">)</span>:<span class="w"> </span>println
</pre></div>
<p>We'll need to add <a href="https://golang.org/pkg/go/ast/#Ident">ast.Ident</a>
support to <code>interpretCallExpr</code>. This will be a simple
lookup in context. We'll also add a <code>setValue</code> helper since
the <code>ret</code> value is serving double-duty as a value passing
mechanism and also a function's return value (solely where multiple
value are a thing).</p>
<div class="highlight"><pre><span></span><span class="o">...</span>
<span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">)</span><span class="w"> </span><span class="nx">setValue</span><span class="p">(</span><span class="nx">v</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">{</span><span class="nx">v</span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">set</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span>
<span class="p">}</span>
<span class="o">...</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">CallExpr</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretCallExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Ident</span><span class="p">:</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">ctx</span><span class="p">[</span><span class="nx">e</span><span class="p">.</span><span class="nx">Name</span><span class="p">])</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown expr type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">e</span><span class="p">),</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>This is also a good time to add the <code>println</code> builtin to
our top-level context.</p>
<div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">...</span>
<span class="w"> </span><span class="n">ctx</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">context</span><span class="p">{}</span>
<span class="w"> </span><span class="n">interpret</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">)</span>
<span class="w"> </span><span class="n">ctx</span><span class="p">[</span><span class="s2">"println"</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">{</span>
<span class="w"> </span><span class="n">fn</span><span class="p">,</span>
<span class="w"> </span><span class="k">func</span><span class="p">(</span><span class="n">ctx</span><span class="w"> </span><span class="n">context</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">ret</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="p">[]</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="p">[]</span><span class="n">interface</span><span class="p">{}</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="nb">range</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">values</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">values</span><span class="o">...</span><span class="p">)</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="n">ret</span>
<span class="w"> </span><span class="n">ctx</span><span class="p">[</span><span class="s2">"main"</span><span class="p">]</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">context</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">ret</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="n">value</span><span class="p">))(</span><span class="n">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="n">r</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="n">value</span><span class="p">{})</span>
<span class="p">}</span>
</pre></div>
<h3 id="more-ast.expr-implementations">More ast.Expr implementations</h3><p>Running the interpreter again we get:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:41:59<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.BasicLit<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>ValuePos:151<span class="w"> </span>Kind:INT<span class="w"> </span>Value:15<span class="o">}</span>
</pre></div>
<p>Easy enough: we'll switch on the "kind" and parse a string int to an
int and wrap it in our value type.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">CallExpr</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretCallExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Ident</span><span class="p">:</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">ctx</span><span class="p">[</span><span class="nx">e</span><span class="p">.</span><span class="nx">Name</span><span class="p">])</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BasicLit</span><span class="p">:</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Kind</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">INT</span><span class="p">:</span>
<span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseInt</span><span class="p">(</span><span class="nx">e</span><span class="p">.</span><span class="nx">Value</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">i64</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">})</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown basiclit type: %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown expr type (%s): %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">e</span><span class="p">),</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Now we run again:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:48:46<span class="w"> </span>Unknown<span class="w"> </span>stmt<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.IfStmt<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>If:38<span class="w"> </span>Init:<nil><span class="w"> </span>Cond:0xc0000ac150<span class="w"> </span>Body:0xc0000ac1b0<span class="w"> </span>Else:<nil><span class="o">}</span>
</pre></div>
<p>Cool, more control flow!</p>
<h3 id="interpreting-ast.ifstmt">Interpreting ast.IfStmt</h3><p>For <a href="https://golang.org/pkg/go/ast/#IfStmt">ast.IfStmt</a> we interpret
the condition and, depending on the condition, interpret the body or
the else node. In order to make empty else interpreting easier, we'll
also add a nil short-circuit to <code>interpretStmt</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretIfStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">IfStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Init</span><span class="p">)</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cr</span><span class="w"> </span><span class="nx">ret</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">cr</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Cond</span><span class="p">)</span>
<span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cr</span><span class="p">.</span><span class="nx">valus</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Else</span><span class="p">)</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">IfStmt</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretIfStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="o">...</span>
</pre></div>
<p>Let's try it out:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:56:28<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.BinaryExpr<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>X:a<span class="w"> </span>OpPos:43<span class="w"> </span>Op:<span class="o">==</span><span class="w"> </span>Y:0xc00008a120<span class="o">}</span>
</pre></div>
<p>Great!</p>
<h3 id="interpreting-ast.binaryexpr">Interpreting ast.BinaryExpr</h3><p>An <a href="https://golang.org/pkg/go/ast/#BinaryExpr">ast.BinaryExpr</a> has an
<code>Op</code> field that we'll switch on to decide what operations
to do. We'll interpret the left side and then the right side and
finally perform the operation and return the result. The three binary
operations we use in this program are <code>==</code>, <code>+</code>
and <code>-</code>. We'll look these up in <a href="https://golang.org/pkg/go/token/#Token">go/token
docs</a> to discover the
associated constants.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretBinaryExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BinaryExpr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">xr</span><span class="p">,</span><span class="w"> </span><span class="nx">yr</span><span class="w"> </span><span class="nx">ret</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">xr</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">.</span><span class="nx">X</span><span class="p">)</span>
<span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">xr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">yr</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">.</span><span class="nx">Y</span><span class="p">)</span>
<span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">yr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">.</span><span class="nx">Op</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">ADD</span><span class="p">:</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">i64</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">SUB</span><span class="p">:</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">i64</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)})</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">EQL</span><span class="p">:</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">bl</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)})</span>
<span class="w"> </span><span class="k">default</span><span class="p">:</span>
<span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">"Unknown binary expression type: %+v"</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">)</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BinaryExpr</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretBinaryExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span>
<span class="w"> </span><span class="o">...</span>
</pre></div>
<p>Let's try one more time!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">2019</span>/10/12<span class="w"> </span><span class="m">11</span>:06:19<span class="w"> </span>Unknown<span class="w"> </span>stmt<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.ReturnStmt<span class="o">)</span>:<span class="w"> </span><span class="p">&</span><span class="o">{</span>Return:94<span class="w"> </span>Results:<span class="o">[</span>0xc000070540<span class="o">]}</span>
</pre></div>
<p>Awesome, last step.</p>
<h3 id="interpreting-ast.returnstmt">Interpreting ast.ReturnStmt</h3><p>Based on the
<a href="https://golang.org/pkg/go/ast/#ReturnStmt">ast.ReturnStmt</a> definition
we'll have to interpret each expression and set all of them to the
<code>ret</code> value.</p>
<div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretReturnStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">ReturnStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Results</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="nx">ret</span>
<span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="p">)</span>
<span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">values</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValues</span><span class="p">(</span><span class="nx">values</span><span class="p">)</span>
<span class="w"> </span><span class="k">return</span>
<span class="p">}</span>
<span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">ReturnStmt</span><span class="p">:</span>
<span class="w"> </span><span class="nx">interpretReturnStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span>
<span class="w"> </span><span class="o">...</span>
</pre></div>
<p>And let's try one last time:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">377</span>
</pre></div>
<p>Looking good. :) Let's try with another input:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>fib.go
package<span class="w"> </span>main
func<span class="w"> </span>fib<span class="o">(</span>a<span class="w"> </span>int<span class="o">)</span><span class="w"> </span>int<span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">1</span>
<span class="w"> </span><span class="o">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>fib<span class="o">(</span>a-1<span class="o">)</span><span class="w"> </span>+<span class="w"> </span>fib<span class="o">(</span>a-2<span class="o">)</span>
<span class="o">}</span>
func<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>println<span class="o">(</span>fib<span class="o">(</span><span class="m">14</span><span class="o">))</span>
<span class="o">}</span>
$<span class="w"> </span>./goi<span class="w"> </span>fib.go
<span class="m">233</span>
</pre></div>
<p>We've got the basics of an interpreter for Golang.</p>
<p><blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">Here's a blog post on building a simple AST interpreter for Go to support running a recursive fibonacci implementation <a href="https://t.co/5Zz388d8ZN">https://t.co/5Zz388d8ZN</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1183039387170430976?ref_src=twsrc%5Etfw">October 12, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/interpreting-go.htmlSat, 12 Oct 2019 00:00:00 +0000
- Administering Kubernetes is hardhttp://notes.eatonphil.com/administering-kubernetes-is-hard.html<p>Kubernetes is easy to use after some exposure; it's pretty convenient
too. But it is super hard to set up.</p>
<p><a href="https://eksctl.io">eksctl</a> is a good tool for folks who don't want to
spend hours/days/weeks debugging VPC configuration in 1000s of lines
of CloudFormation. None of the other tools seem to be that much easier
to use (kops, kubeadm, etc.).</p>
<p>But even with EKS and eksctl you are constrained to Amazon Linux
worker nodes. AMIs are practically impossible to discover.</p>
<p>I haven't spent much time with GKE.</p>
<p>And while eksctl operates on the right level for developers needing to
administrate small/medium-sized systems, it... doesn't exist outside
EKS.</p>
<p>It is unfortunate the only major container orchestration system is
this complex to administer. The user-facing APIs are pretty solid and
guide toward sustainable system design. It is <em>really</em> hard to see the
value for most companies with medium-sized deployments tasked with
administration. Among serious proprietary alternatives, sure, there's
ECS and Google App Engine. But there's little advantage in existing
Kubernetes user knowledge. The OSS alternatives don't have the
adoption to seem like a good investment.</p>
<p>OpenStack's <a href="https://wiki.openstack.org/wiki/Magnum">magnum</a> or
OpenShift seem like possible high-level providers for a generic
environment. But neither are particularly known for stability.</p>
<p>In all, the ecosystem has gotten friendlier. There will probably be a
time in the future (3-5 years from now?) when Kubernetes is fairly
easy to administer.</p>
<p>I'd love to hear your thoughts and experiences administering
Kubernetes.</p>
http://notes.eatonphil.com/administering-kubernetes-is-hard.htmlMon, 30 Sep 2019 00:00:00 +0000
- Unit testing C code with gtesthttp://notes.eatonphil.com/unit-testing-c-code-with-gtest.html<p>This post covers building and testing a minimal, but still useful, C
project. We'll use <a href="https://github.com/google/googletest">Google's
gtest</a> and
<a href="https://cmake.org">CMake</a> for testing C code. This will serve as a
foundation for some upcoming posts/projects on programming Linux,
userland networking and interpreters.</p>
<p class="note">
The first version of this post only included one module to
test. The <code>test/CMakeLists.txt</code> would also only expose a
single pass-fail status for all modules. The second version of this
post extends the <code>test/CMakeLists.txt</code> to expose
each <code>test/*.cpp</code> file as its own CMake test so that
results are displayed by <code>ctest</code> per file. The second
version also splits the original <code>src/testy.c</code>
and <code>include/testy/testy.h</code> module into
a <code>widget</code> and <code>customer</code> module to
demonstrate the changes to the CMake configuration.
</p><h3 id="the-"testy"-sample-project">The "testy" sample project</h3><p>In this project, we'll put source code in <code>src/</code> and publicly
exported symbols (functions, structs, etc.) in header files in
<code>include/testy/</code>. There will be a <code>main.c</code> in the <code>src/</code>
directory. Tests are written in C++ (since gtest is a C++ testing
framework) and are in the <code>test/</code> directory.</p>
<p>Here's an overview of the source and test code.</p>
<h4 id="src/widget.c">src/widget.c</h4><p>This file has some library code that we should be able to test.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"testy/widget.h"</span>
<span class="kt">int</span><span class="w"> </span><span class="n">private_ok_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">widget_ok</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">private_ok_value</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h4 id="include/testy/widget.h">include/testy/widget.h</h4><p>This file handles exported symbols for widget code.</p>
<div class="highlight"><pre><span></span><span class="cp">#ifndef _WIDGET_H_</span>
<span class="cp">#define _WIDGET_H_</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">widget_ok</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">);</span>
<span class="cp">#endif</span>
</pre></div>
<h4 id="src/customer.c">src/customer.c</h4><p>This file has some more library code that we should be able to test.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"testy/customer.h"</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">customer_check</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">5</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h4 id="include/testy/customer.h">include/testy/customer.h</h4><p>This file handles exported symbols for customer code.</p>
<div class="highlight"><pre><span></span><span class="cp">#ifndef _CUSTOMER_H_</span>
<span class="cp">#define _CUSTOMER_H_</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">customer_check</span><span class="p">(</span><span class="kt">int</span><span class="p">);</span>
<span class="cp">#endif</span>
</pre></div>
<h4 id="src/main.c">src/main.c</h4><p>This is the entrypoint to a program built around libtesty.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"testy/customer.h"</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"testy/widget.h"</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">widget_ok</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">customer_check</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h4 id="test/widget.cpp">test/widget.cpp</h4><p>This is one of our test files. It registers test cases and uses gtest
to make assertions. We need to wrap the <code>testy/widget.h</code> include in an
<code>extern "C"</code> to stop C++ from
<a href="https://www.geeksforgeeks.org/extern-c-in-c/">name-mangling</a>.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"gtest/gtest.h"</span>
<span class="k">extern</span><span class="w"> </span><span class="s">"C"</span><span class="w"> </span><span class="p">{</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"testy/widget.h"</span>
<span class="p">}</span>
<span class="n">TEST</span><span class="p">(</span><span class="n">widget</span><span class="p">,</span><span class="w"> </span><span class="n">ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">widget_ok</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">TEST</span><span class="p">(</span><span class="n">testy</span><span class="p">,</span><span class="w"> </span><span class="n">not_ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">widget_ok</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>You can see a good high-level overview of gtest testing utilities like
<code>ASSERT_EQ</code> and <code>TEST</code>
<a href="https://github.com/google/googletest/blob/master/googletest/docs/primer.md">here</a>.</p>
<h4 id="test/customer.cpp">test/customer.cpp</h4><p>This is another one of our test files.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"gtest/gtest.h"</span>
<span class="k">extern</span><span class="w"> </span><span class="s">"C"</span><span class="w"> </span><span class="p">{</span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf">"testy/customer.h"</span>
<span class="p">}</span>
<span class="n">TEST</span><span class="p">(</span><span class="n">customer</span><span class="p">,</span><span class="w"> </span><span class="n">ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">customer_check</span><span class="p">(</span><span class="mi">5</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">TEST</span><span class="p">(</span><span class="n">testy</span><span class="p">,</span><span class="w"> </span><span class="n">not_ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">customer_check</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span>
<span class="p">}</span>
</pre></div>
<h4 id="test/main.cpp">test/main.cpp</h4><p>This is a standard entrypoint for the test runner.</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"gtest/gtest.h"</span>
<span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="o">::</span><span class="n">testing</span><span class="o">::</span><span class="n">InitGoogleTest</span><span class="p">(</span><span class="o">&</span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">RUN_ALL_TESTS</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<h3 id="building-with-cmake">Building with CMake</h3><p><a href="https://cmake.org">CMake</a> is a build tool that (among other things)
produces a Makefile we can run to build our code. We will also use it
for dependency management. But fundementally we use it because gtest
requires it.</p>
<p>CMake options/rules are defined in a CMakeLists.txt file. We'll have
one in the root directory, one in the test directory, and a template
for one that will handle the gtest dependency.</p>
<p>A first draft of the top-level CMakeLists.txt might look like this:</p>
<div class="highlight"><pre><span></span><span class="nb">cmake_minimum_required</span><span class="p">(</span><span class="s">VERSION</span><span class="w"> </span><span class="s">3.1</span><span class="p">)</span>
<span class="nb">project</span><span class="p">(</span><span class="s">testy</span><span class="p">)</span>
<span class="c">##</span>
<span class="c">### Source definitions ###</span>
<span class="c">##</span>
<span class="nb">include_directories</span><span class="p">(</span><span class="s2">"${PROJECT_SOURCE_DIR}/include"</span><span class="p">)</span>
<span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">sources</span><span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/src/*.c"</span><span class="p">)</span>
<span class="nb">add_executable</span><span class="p">(</span><span class="s">testy</span><span class="w"> </span><span class="o">${</span><span class="nv">sources</span><span class="o">}</span><span class="p">)</span>
</pre></div>
<p>Using <code>include_directory</code> will make sure we compile with the <code>-I</code> flag
set up correctly for our include directory.</p>
<p>Using <code>add_executable</code> sets up the binary name to produce from the
given sources. And we're using the <code>file</code> helper to get a glob match
of C files rather than listing them all out verbatim in the
<code>add_executable</code> call.</p>
<h4 id="building-and-running">Building and running</h4><p>CMake pollutes the current directory, and is fine running in a
different directory, so we'll make a <code>build/</code> directory so we don't
pollute root. Then we'll build a Makefile with CMake, run Make, and
run our program.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>build
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build
$<span class="w"> </span>cmake<span class="w"> </span>..
--<span class="w"> </span>The<span class="w"> </span>C<span class="w"> </span>compiler<span class="w"> </span>identification<span class="w"> </span>is<span class="w"> </span>AppleClang<span class="w"> </span><span class="m">10</span>.0.1.10010046
--<span class="w"> </span>The<span class="w"> </span>CXX<span class="w"> </span>compiler<span class="w"> </span>identification<span class="w"> </span>is<span class="w"> </span>AppleClang<span class="w"> </span><span class="m">10</span>.0.1.10010046
--<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>C<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/cc
--<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>C<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/cc<span class="w"> </span>--<span class="w"> </span>works
--<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info
--<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info<span class="w"> </span>-<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compile<span class="w"> </span>features
--<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compile<span class="w"> </span>features<span class="w"> </span>-<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>CXX<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/c++
--<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>CXX<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/c++<span class="w"> </span>--<span class="w"> </span>works
--<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info
--<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info<span class="w"> </span>-<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compile<span class="w"> </span>features
--<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compile<span class="w"> </span>features<span class="w"> </span>-<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Configuring<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Generating<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Build<span class="w"> </span>files<span class="w"> </span>have<span class="w"> </span>been<span class="w"> </span>written<span class="w"> </span>to:<span class="w"> </span>/Users/philipeaton/tmp/testy/build
$<span class="w"> </span>make
<span class="o">[</span><span class="w"> </span><span class="m">25</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/customer.c.o
<span class="o">[</span><span class="w"> </span><span class="m">50</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/widget.c.o
<span class="o">[</span><span class="w"> </span><span class="m">75</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/main.c.o
<span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>C<span class="w"> </span>executable<span class="w"> </span>testy
<span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>testy
$<span class="w"> </span>./testy
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">1</span>
</pre></div>
<h3 id="cmakelists.txt.in">CMakeLists.txt.in</h3><p>This template file handles downloading the gtest dependency from
github.com pinned to a release. It will be copied into a subdirectory
during the <code>cmake ..</code> step.</p>
<div class="highlight"><pre><span></span><span class="nb">cmake_minimum_required</span><span class="p">(</span><span class="s">VERSION</span><span class="w"> </span><span class="s">3.1</span><span class="p">)</span>
<span class="nb">project</span><span class="p">(</span><span class="s">googletest-download</span><span class="w"> </span><span class="s">NONE</span><span class="p">)</span>
<span class="nb">include</span><span class="p">(</span><span class="s">ExternalProject</span><span class="p">)</span>
<span class="nb">ExternalProject_Add</span><span class="p">(</span><span class="s">googletest</span>
<span class="w"> </span><span class="s">GIT_REPOSITORY</span><span class="w"> </span><span class="s">https://github.com/google/googletest.git</span>
<span class="w"> </span><span class="s">GIT_TAG</span><span class="w"> </span><span class="s">release-1.8.1</span>
<span class="w"> </span><span class="s">SOURCE_DIR</span><span class="w"> </span><span class="s2">"${CMAKE_BINARY_DIR}/googletest-src"</span>
<span class="w"> </span><span class="s">BINARY_DIR</span><span class="w"> </span><span class="s2">"${CMAKE_BINARY_DIR}/googletest-build"</span>
<span class="w"> </span><span class="s">CONFIGURE_COMMAND</span><span class="w"> </span><span class="s2">""</span>
<span class="w"> </span><span class="s">BUILD_COMMAND</span><span class="w"> </span><span class="s2">""</span>
<span class="w"> </span><span class="s">INSTALL_COMMAND</span><span class="w"> </span><span class="s2">""</span>
<span class="w"> </span><span class="s">TEST_COMMAND</span><span class="w"> </span><span class="s2">""</span>
<span class="p">)</span>
</pre></div>
<p>Now we can tell CMake about it and how to build, within the top-level
CMakeLists.txt file.</p>
<div class="highlight"><pre><span></span><span class="nb">cmake_minimum_required</span><span class="p">(</span><span class="s">VERSION</span><span class="w"> </span><span class="s">3.1</span><span class="p">)</span>
<span class="nb">project</span><span class="p">(</span><span class="s">testy</span><span class="p">)</span>
<span class="c">##</span>
<span class="c">### Test definitions ###</span>
<span class="c">##</span>
<span class="nb">configure_file</span><span class="p">(</span><span class="s">CMakeLists.txt.in</span>
<span class="w"> </span><span class="s">googletest-download/CMakeLists.txt</span><span class="p">)</span>
<span class="nb">execute_process</span><span class="p">(</span><span class="s">COMMAND</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_COMMAND</span><span class="o">}</span><span class="w"> </span><span class="s">-G</span><span class="w"> </span><span class="s2">"${CMAKE_GENERATOR}"</span><span class="w"> </span><span class="s">.</span>
<span class="w"> </span><span class="s">WORKING_DIRECTORY</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-download</span><span class="w"> </span><span class="p">)</span>
<span class="nb">execute_process</span><span class="p">(</span><span class="s">COMMAND</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_COMMAND</span><span class="o">}</span><span class="w"> </span><span class="s">--build</span><span class="w"> </span><span class="s">.</span>
<span class="w"> </span><span class="s">WORKING_DIRECTORY</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-download</span><span class="w"> </span><span class="p">)</span>
<span class="nb">add_subdirectory</span><span class="p">(</span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-src</span>
<span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-build</span><span class="p">)</span>
<span class="nb">enable_testing</span><span class="p">()</span>
<span class="nb">add_subdirectory</span><span class="p">(</span><span class="s">test</span><span class="p">)</span>
<span class="c">##</span>
<span class="c">### Source definitions ###</span>
<span class="c">##</span>
<span class="nb">include_directories</span><span class="p">(</span><span class="s2">"${PROJECT_SOURCE_DIR}/include"</span><span class="p">)</span>
<span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">sources</span>
<span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/include/testy/*.h"</span>
<span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/src/*.c"</span><span class="p">)</span>
<span class="nb">add_executable</span><span class="p">(</span><span class="s">testy</span><span class="w"> </span><span class="o">${</span><span class="nv">sources</span><span class="o">}</span><span class="p">)</span>
</pre></div>
<p>The <code>add_subdirectory</code> calls register a directory that contains its
own CMakeLists.txt. It would fail now without a <code>CMakeLists.txt</code> file
in the <code>test/</code> directory.</p>
<h3 id="test/cmakelists.txt">test/CMakeLists.txt</h3><p>This final file registers a <code>unit_test</code> executable compiling against
the source and test code, and includes the project header files.</p>
<div class="highlight"><pre><span></span><span class="nb">include_directories</span><span class="p">(</span><span class="s2">"${PROJECT_SOURCE_DIR}/include"</span><span class="p">)</span>
<span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">sources</span><span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/src/*.c"</span><span class="p">)</span>
<span class="nb">list</span><span class="p">(</span><span class="s">REMOVE_ITEM</span><span class="w"> </span><span class="s">sources</span><span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/src/main.c"</span><span class="p">)</span>
<span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">tests</span><span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/test/*.cpp"</span><span class="p">)</span>
<span class="nb">list</span><span class="p">(</span><span class="s">REMOVE_ITEM</span><span class="w"> </span><span class="s">tests</span><span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/test/main.cpp"</span><span class="p">)</span>
<span class="nb">foreach</span><span class="p">(</span><span class="s">file</span><span class="w"> </span><span class="o">${</span><span class="nv">tests</span><span class="o">}</span><span class="p">)</span>
<span class="w"> </span><span class="nb">set</span><span class="p">(</span><span class="s">name</span><span class="p">)</span>
<span class="w"> </span><span class="nb">get_filename_component</span><span class="p">(</span><span class="s">name</span><span class="w"> </span><span class="o">${</span><span class="nv">file</span><span class="o">}</span><span class="w"> </span><span class="s">NAME_WE</span><span class="p">)</span>
<span class="w"> </span><span class="nb">add_executable</span><span class="p">(</span><span class="s2">"${name}_tests"</span>
<span class="w"> </span><span class="o">${</span><span class="nv">sources</span><span class="o">}</span>
<span class="w"> </span><span class="o">${</span><span class="nv">file</span><span class="o">}</span>
<span class="w"> </span><span class="s2">"${PROJECT_SOURCE_DIR}/test/main.cpp"</span><span class="p">)</span>
<span class="w"> </span><span class="nb">target_link_libraries</span><span class="p">(</span><span class="s2">"${name}_tests"</span><span class="w"> </span><span class="s">gtest_main</span><span class="p">)</span>
<span class="w"> </span><span class="nb">add_test</span><span class="p">(</span><span class="s">NAME</span><span class="w"> </span><span class="o">${</span><span class="nv">name</span><span class="o">}</span><span class="w"> </span><span class="s">COMMAND</span><span class="w"> </span><span class="s2">"${name}_tests"</span><span class="p">)</span>
<span class="nb">endforeach</span><span class="p">()</span>
</pre></div>
<p>We have to register a test for each file otherwise each file's tests
won't show up by default (i.e. without a <code>--verbose</code> flag).</p>
<h4 id="building-and-running-tests">Building and running tests</h4><p>Similar to building and running the source, we run CMake in a
subdirectory but run <code>make test</code> or <code>ctest</code> after building all sources
and tests with <code>make</code>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build
$<span class="w"> </span>cmake<span class="w"> </span>..
--<span class="w"> </span>Configuring<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Generating<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Build<span class="w"> </span>files<span class="w"> </span>have<span class="w"> </span>been<span class="w"> </span>written<span class="w"> </span>to:<span class="w"> </span>/Users/philipeaton/tmp/testy/build/googletest-download
Scanning<span class="w"> </span>dependencies<span class="w"> </span>of<span class="w"> </span>target<span class="w"> </span>googletest
<span class="o">[</span><span class="w"> </span><span class="m">11</span>%<span class="o">]</span><span class="w"> </span>Creating<span class="w"> </span>directories<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="w"> </span><span class="m">22</span>%<span class="o">]</span><span class="w"> </span>Performing<span class="w"> </span>download<span class="w"> </span>step<span class="w"> </span><span class="o">(</span>git<span class="w"> </span>clone<span class="o">)</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
Cloning<span class="w"> </span>into<span class="w"> </span><span class="s1">'googletest-src'</span>...
Note:<span class="w"> </span>checking<span class="w"> </span>out<span class="w"> </span><span class="s1">'release-1.8.1'</span>.
You<span class="w"> </span>are<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="s1">'detached HEAD'</span><span class="w"> </span>state.<span class="w"> </span>You<span class="w"> </span>can<span class="w"> </span>look<span class="w"> </span>around,<span class="w"> </span>make<span class="w"> </span>experimental
changes<span class="w"> </span>and<span class="w"> </span>commit<span class="w"> </span>them,<span class="w"> </span>and<span class="w"> </span>you<span class="w"> </span>can<span class="w"> </span>discard<span class="w"> </span>any<span class="w"> </span>commits<span class="w"> </span>you<span class="w"> </span>make<span class="w"> </span><span class="k">in</span><span class="w"> </span>this
state<span class="w"> </span>without<span class="w"> </span>impacting<span class="w"> </span>any<span class="w"> </span>branches<span class="w"> </span>by<span class="w"> </span>performing<span class="w"> </span>another<span class="w"> </span>checkout.
If<span class="w"> </span>you<span class="w"> </span>want<span class="w"> </span>to<span class="w"> </span>create<span class="w"> </span>a<span class="w"> </span>new<span class="w"> </span>branch<span class="w"> </span>to<span class="w"> </span>retain<span class="w"> </span>commits<span class="w"> </span>you<span class="w"> </span>create,<span class="w"> </span>you<span class="w"> </span>may
<span class="k">do</span><span class="w"> </span>so<span class="w"> </span><span class="o">(</span>now<span class="w"> </span>or<span class="w"> </span>later<span class="o">)</span><span class="w"> </span>by<span class="w"> </span>using<span class="w"> </span>-b<span class="w"> </span>with<span class="w"> </span>the<span class="w"> </span>checkout<span class="w"> </span><span class="nb">command</span><span class="w"> </span>again.<span class="w"> </span>Example:
<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>-b<span class="w"> </span><new-branch-name>
HEAD<span class="w"> </span>is<span class="w"> </span>now<span class="w"> </span>at<span class="w"> </span>2fe3bd99<span class="w"> </span>Merge<span class="w"> </span>pull<span class="w"> </span>request<span class="w"> </span><span class="c1">#1433 from dsacre/fix-clang-warnings</span>
<span class="o">[</span><span class="w"> </span><span class="m">33</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>patch<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="w"> </span><span class="m">44</span>%<span class="o">]</span><span class="w"> </span>Performing<span class="w"> </span>update<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="w"> </span><span class="m">55</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>configure<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="w"> </span><span class="m">66</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>build<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="w"> </span><span class="m">77</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>install<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="w"> </span><span class="m">88</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span><span class="nb">test</span><span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Completed<span class="w"> </span><span class="s1">'googletest'</span>
<span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>googletest
--<span class="w"> </span>Found<span class="w"> </span>PythonInterp:<span class="w"> </span>/usr/local/bin/python<span class="w"> </span><span class="o">(</span>found<span class="w"> </span>version<span class="w"> </span><span class="s2">"2.7.16"</span><span class="o">)</span>
--<span class="w"> </span>Looking<span class="w"> </span><span class="k">for</span><span class="w"> </span>pthread.h
--<span class="w"> </span>Looking<span class="w"> </span><span class="k">for</span><span class="w"> </span>pthread.h<span class="w"> </span>-<span class="w"> </span>found
--<span class="w"> </span>Performing<span class="w"> </span>Test<span class="w"> </span>CMAKE_HAVE_LIBC_PTHREAD
--<span class="w"> </span>Performing<span class="w"> </span>Test<span class="w"> </span>CMAKE_HAVE_LIBC_PTHREAD<span class="w"> </span>-<span class="w"> </span>Success
--<span class="w"> </span>Found<span class="w"> </span>Threads:<span class="w"> </span>TRUE
--<span class="w"> </span>Configuring<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Generating<span class="w"> </span><span class="k">done</span>
--<span class="w"> </span>Build<span class="w"> </span>files<span class="w"> </span>have<span class="w"> </span>been<span class="w"> </span>written<span class="w"> </span>to:<span class="w"> </span>/Users/philipeaton/tmp/testy/build
$<span class="w"> </span>make
<span class="o">[</span><span class="w"> </span><span class="m">4</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/customer.c.o
<span class="o">[</span><span class="w"> </span><span class="m">9</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/widget.c.o
<span class="o">[</span><span class="w"> </span><span class="m">13</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/main.c.o
<span class="o">[</span><span class="w"> </span><span class="m">18</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>C<span class="w"> </span>executable<span class="w"> </span>testy
<span class="o">[</span><span class="w"> </span><span class="m">18</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>testy
<span class="o">[</span><span class="w"> </span><span class="m">22</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
<span class="o">[</span><span class="w"> </span><span class="m">27</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgtest.a
<span class="o">[</span><span class="w"> </span><span class="m">27</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gtest
<span class="o">[</span><span class="w"> </span><span class="m">31</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/CMakeFiles/gmock.dir/src/gmock-all.cc.o
<span class="o">[</span><span class="w"> </span><span class="m">36</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgmock.a
<span class="o">[</span><span class="w"> </span><span class="m">36</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gmock
<span class="o">[</span><span class="w"> </span><span class="m">40</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
<span class="o">[</span><span class="w"> </span><span class="m">45</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgmock_main.a
<span class="o">[</span><span class="w"> </span><span class="m">45</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gmock_main
<span class="o">[</span><span class="w"> </span><span class="m">50</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/gtest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
<span class="o">[</span><span class="w"> </span><span class="m">54</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgtest_main.a
<span class="o">[</span><span class="w"> </span><span class="m">54</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gtest_main
<span class="o">[</span><span class="w"> </span><span class="m">59</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/__/src/customer.c.o
<span class="o">[</span><span class="w"> </span><span class="m">63</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/__/src/widget.c.o
<span class="o">[</span><span class="w"> </span><span class="m">68</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/customer.cpp.o
<span class="o">[</span><span class="w"> </span><span class="m">72</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/main.cpp.o
<span class="o">[</span><span class="w"> </span><span class="m">77</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>executable<span class="w"> </span>customer_tests
<span class="o">[</span><span class="w"> </span><span class="m">77</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>customer_tests
Scanning<span class="w"> </span>dependencies<span class="w"> </span>of<span class="w"> </span>target<span class="w"> </span>widget_tests
<span class="o">[</span><span class="w"> </span><span class="m">81</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/__/src/customer.c.o
<span class="o">[</span><span class="w"> </span><span class="m">86</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/__/src/widget.c.o
<span class="o">[</span><span class="w"> </span><span class="m">90</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/widget.cpp.o
<span class="o">[</span><span class="w"> </span><span class="m">95</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/main.cpp.o
<span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>executable<span class="w"> </span>widget_tests
<span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>widget_tests
</pre></div>
<p>After running <code>cmake</code> and <code>make</code>, we're finally ready to run <code>ctest</code>.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>ctest
Test<span class="w"> </span>project<span class="w"> </span>/Users/philipeaton/tmp/testy/build
<span class="w"> </span>Start<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>customer
<span class="m">1</span>/2<span class="w"> </span>Test<span class="w"> </span><span class="c1">#1: customer .......................... Passed 0.01 sec</span>
<span class="w"> </span>Start<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>widget
<span class="m">2</span>/2<span class="w"> </span>Test<span class="w"> </span><span class="c1">#2: widget ............................ Passed 0.00 sec</span>
<span class="m">100</span>%<span class="w"> </span>tests<span class="w"> </span>passed,<span class="w"> </span><span class="m">0</span><span class="w"> </span>tests<span class="w"> </span>failed<span class="w"> </span>out<span class="w"> </span>of<span class="w"> </span><span class="m">2</span>
Total<span class="w"> </span>Test<span class="w"> </span><span class="nb">time</span><span class="w"> </span><span class="o">(</span>real<span class="o">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span>.01<span class="w"> </span>sec
</pre></div>
<p>Now we're in a good place with most of the challenges of unit testing
C code (i.e. ignoring mocks) past us.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">In preparation for a couple new articles on some C projects, here's a foundational post on building C code and writing/running unit tests with gtest and cmake <a href="https://t.co/aMVyr7LO73">https://t.co/aMVyr7LO73</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1167826536298405894?ref_src=twsrc%5Etfw">August 31, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/unit-testing-c-code-with-gtest.htmlSat, 31 Aug 2019 00:00:00 +0000
- Writing an x86 emulator from scratch in JavaScript: 2. system callshttp://notes.eatonphil.com/emulator-basics-system-calls.html<p class="note">
Previously in emulator basics:
<! forgive me, for I have sinned >
<br />
<a href="/emulator-basics-a-stack-and-register-machine.html">1. a stack and register machine</a>
</p><p>In this post we'll extend <a href="https://github.com/eatonphil/x86e">x86e</a> to
support the exit and write Linux system calls, or syscalls. A syscall
is a function handled by the kernel that allows the process to
interact with data outside of its memory. The <code>SYSCALL</code>
instruction takes arguments in the same order that the
regular <code>CALL</code> instruction does. But <code>SYSCALL</code>
additionally requires the <code>RAX</code> register to contain the
integer number of the syscall.</p>
<p>Historically, there have been a number of different ways to make
syscalls. All methods perform variations on a software interrupt.
Before AMD64, on x86 processors, there was the <code>SYSENTER</code>
instruction. And before that there was only <code>INT 80h</code>
to trigger the interrupt with the syscall handler (since interrupts
can be used for more than just syscalls). The various instructions
around interrupts have been added for efficiency as the processors and
use by operating systems evolved.</p>
<p>Since this is a general need and AMD64 processors are among the most
common today, you'll see similar code in every modern operating system
such as FreeBSD, OpenBSD, NetBSD, macOS, and Linux. (I have no
background in Windows.) The calling convention may differ (e.g. which
arguments are in which registers) and the syscall numbers differ.
Even within Linux both the calling convention and the syscall numbers
differ between x86 (32-bit) and AMD64/x86_64 (64-bit) versions.</p>
<p>See this <a href="https://stackoverflow.com/a/15169141/1507139">StackOverflow
post</a> for some more
detail.</p>
<p><a href="https://gist.github.com/eatonphil/2d16bc3dae33bff8a8d7f2a9d13025c3">Code for this post in full is available as a
Gist.</a></p>
<h4 id="exit">Exit</h4><p>The exit syscall is how a child process communicates with the process
that spawned it (its parent) when the child is finished running. Exit
takes one argument, called the exit code or status code. It is an
arbitrary signed 8-bit integer. If the high bit is set (i.e. the
number is negative), this is interpreted to mean the process exited
abnormally such as due to a segfault. Shells additionally
interpret any non-zero exit code as a "failure". Otherwise, and
ignoring these two common conventions, it can be used to mean anything
the programmer wants.</p>
<p class="note">
The wait syscall is how the parent process can block until exit is
called by the child and receive its exit code.
</p><p>On AMD64 Linux the syscall number is 60. For example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">60</span>
<span class="w"> </span><span class="nf">SYSCALL</span>
</pre></div>
<p>This calls exit with a status code of 0.</p>
<h4 id="write">Write</h4><p>The write syscall is how a process can send data to file descriptors,
which are integers representing some file-like object. By default, a
Linux process is given access to three file descriptors with
consistent integer values: stdin is 0, stdout is 1, and stderr is 2.
Write takes three arguments: the file descriptor integer to write
to, a starting address to memory that is interpreted as a byte array,
and the number of bytes to write to the file descriptor beginning at
the start address.</p>
<p>On AMD64 Linux the syscall number is 1. For example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; stdout</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="nb">R12</span><span class="w"> </span><span class="c1">; address of string</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDX</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="c1">; 8 bytes to write</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; write</span>
<span class="w"> </span><span class="nf">SYSCALL</span>
</pre></div>
<p>This writes 8 bytes to stdout starting from the string whose address
is in R12.</p>
<h3 id="implementing-syscalls">Implementing syscalls</h3><p>Our emulator is simplistic and is currently only implementing process
emulation, not full CPU emulation. So the syscalls themselves will be
handled in JavaScript. First we'll write out stubs for the two
syscalls we are adding. And we'll provide a map from syscall id to the
syscall.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_write</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="mf">60</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_exit</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span>
<span class="p">};</span>
</pre></div>
<p>We need to add an instruction handler to our instruction switch. In
doing so we must convert the value in <code>RAX</code> from a BigInt
to a regular Number so we can look it up in the syscall map.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'syscall'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">idNumber</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RAX</span><span class="p">);</span>
<span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="p">[</span><span class="nx">idNumber</span><span class="p">](</span><span class="nx">process</span><span class="p">);</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="exit">Exit</h4><p>Exit is really simple. It will be implemented by calling Node's
<code>global.process.exit()</code>. Again we'll need to convert the
register's BigInt value to a Number.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_write</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="mf">60</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_exit</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nb">global</span><span class="p">.</span><span class="nx">process</span><span class="p">.</span><span class="nx">exit</span><span class="p">(</span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RDI</span><span class="p">));</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">};</span>
</pre></div>
<h4 id="write">Write</h4><p>Write will be implemented by iterating over the process memory as
bytes and by calling <code>write()</code> on the relevant file
descriptor. We'll store a map of these on the process object and
supply stdout, stderr, and stdin proxies on startup.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">file</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">process</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">registers</span><span class="p">,</span>
<span class="w"> </span><span class="nx">memory</span><span class="p">,</span>
<span class="w"> </span><span class="nx">instructions</span><span class="p">,</span>
<span class="w"> </span><span class="nx">labels</span><span class="p">,</span>
<span class="w"> </span><span class="nx">fd</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// stdout</span>
<span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="nb">global</span><span class="p">.</span><span class="nx">process</span><span class="p">.</span><span class="nx">stdout</span><span class="p">,</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>The base address is stored in <code>RSI</code>, the number of bytes to
write are stored in <code>RDX</code>. And the file descriptor to write
to is stored in <code>RDI</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_write</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSI</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RDX</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">bytes</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">byte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span><span class="p">.</span><span class="nx">fromCharCode</span><span class="p">(</span><span class="nb">Number</span><span class="p">(</span><span class="kr">byte</span><span class="p">));</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">fd</span><span class="p">[</span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RDI</span><span class="p">)].</span><span class="nx">write</span><span class="p">(</span><span class="kr">char</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="p">...</span>
</pre></div>
<h3 id="all-together">All together</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>exit3.asm
main:
<span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">1</span>
<span class="w"> </span>MOV<span class="w"> </span>RSI,<span class="w"> </span><span class="m">2</span>
<span class="w"> </span>ADD<span class="w"> </span>RDI,<span class="w"> </span>RSI
<span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">60</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nb">exit</span>
<span class="w"> </span>SYSCALL
$<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>exit3.asm
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">3</span>
</pre></div>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>hello.asm
main:
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="se">\n</span>
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">33</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>!
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">111</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>o
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">108</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>l
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">108</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>l
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">101</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>e
<span class="w"> </span>PUSH<span class="w"> </span><span class="m">72</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>H
<span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>stdout
<span class="w"> </span>MOV<span class="w"> </span>RSI,<span class="w"> </span>RSP<span class="w"> </span><span class="p">;</span><span class="w"> </span>address<span class="w"> </span>of<span class="w"> </span>string
<span class="w"> </span>MOV<span class="w"> </span>RDX,<span class="w"> </span><span class="m">56</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">8</span>-bit<span class="w"> </span>characters<span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span>string<span class="w"> </span>but<span class="w"> </span>PUSH<span class="w"> </span>acts<span class="w"> </span>on<span class="w"> </span><span class="m">64</span>-bit<span class="w"> </span>integers
<span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>write
<span class="w"> </span>SYSCALL
<span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">0</span>
<span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">60</span>
<span class="w"> </span>SYSCALL
$<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>hello.asm
Hello!
$
</pre></div>
<h3 id="next-steps">Next steps</h3><p>We still aren't setting flags appropriately to support conditionals,
so that's low-hanging fruit. There are some other fun syscalls to
implement that would also give us access to an emulated VGA card so we
could render graphics. Syntactic support for string constants would
also be convenient and more efficient.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post in the emulator basics series up: implementing some syscalls starting with sys_exit and sys_write so we can print a nice hello message. <a href="https://t.co/NEfId0lnJx">https://t.co/NEfId0lnJx</a> <a href="https://twitter.com/hashtag/javascript?src=hash&ref_src=twsrc%5Etfw">#javascript</a> <a href="https://twitter.com/hashtag/x86?src=hash&ref_src=twsrc%5Etfw">#x86</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1152689255900176386?ref_src=twsrc%5Etfw">July 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/emulator-basics-system-calls.htmlSat, 20 Jul 2019 00:00:00 +0000
- Writing a lisp compiler from scratch in JavaScript: 6. LLVM system callshttp://notes.eatonphil.com/compiler-basics-llvm-system-calls.html<p class="note">
Previously in compiler basics:
<! forgive me, for I have sinned >
<br />
<a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a>
<br />
<a href="/compiler-basics-functions.html">2. user-defined functions and variables</a>
<br />
<a href="/compiler-basics-llvm.html">3. LLVM</a>
<br />
<a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a>
<br />
Next in compiler basics:
<br />
<a href="/compiler-basics-an-x86-upgrade.html">5. an x86 upgrade</a>
</p><p>In this post we'll extend the <a href="https://github.com/eatonphil/ulisp">ulisp
compiler</a>'s LLVM backend to
support printing integers to stdout.</p>
<h3 id="exit-code-limitations">Exit code limitations</h3><p>Until now we've validated program state by setting the exit code to
the result of the program computation. But the exit code is an eight
bit integer. What if we want to validate a computation that produces
a result larger than 255?</p>
<p>To do this we need a way to print integers. This is challenging
because printing normally deals with byte arrays. libc's
<code>printf</code>, for example, takes a byte array as its first
argument.</p>
<p>The shortest path forward is to add support for system calls so we can
print one character at a time. Here's a version of a <code>print</code>
form that hacks around not having arrays to send each integer in a
number to stdout.</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nv">c</span><span class="p">)</span>
<span class="w"> </span><span class="c1">; First argument is stdout</span>
<span class="w"> </span><span class="c1">; Second argument is a pointer to a char array (of length one)</span>
<span class="w"> </span><span class="c1">; Third argument is the length of the char array</span>
<span class="w"> </span><span class="p">(</span><span class="nv">syscall/sys_write</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">&c</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nv">n</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">></span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">9</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nb">/</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">)))</span>
<span class="w"> </span><span class="c1">; 48 is the ASCII code for '0'</span>
<span class="w"> </span><span class="p">(</span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">48</span><span class="w"> </span><span class="p">(</span><span class="nv">%</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">))))</span>
</pre></div>
<p>In order to support this we need to add the
<code>syscall/sys_write</code>, <code>></code>, <code>%</code>,
and <code>/</code> builtin forms. We'll also need to add support for
taking the address of a variable.</p>
<p>All <a href="https://github.com/eatonphil/ulisp">code is available on Github</a>
as is the <a href="https://github.com/eatonphil/ulisp/commit/213b83b8e952c210ba408bf38e59ae677d19e643">particular commit related to this
post</a>.</p>
<h3 id="references">References</h3><p>The <code>sys_write</code> syscall requires us to pass the memory
address of the byte array to write. We don't support arrays, but we
can treat an individual variable as an array of length one by passing
the variable's address.</p>
<p>If we were compiling to C we could just pass the address of a local
variable. But LLVM doesn't allow us to take the address of variables
directly. We need to push the variable onto the LLVM stack to get an
address.</p>
<p class="note">
Under the hood LLVM will likely optimize this into a local variable
reference instead of first pushing to the stack.
</p><p>Since LLVM IR is typed, the value representing the address of a local
variable will be a pointer type. We'll need to refer to all uses of
this value as a pointer. So we'll need to modify ulisp to track local
types rather than hard-coding <code>i64</code> everywhere.</p>
<h4 id="scope">Scope</h4><p>To begin we'll modify the <code>Scope</code> class to track types. We
only need to do this on registration. Afterward, we'll have to find
all uses of local variables to make sure they use the
local's <code>value</code> and <code>type</code> fields appropriately.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">local</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">copy</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">n</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="o">:</span><span class="w"> </span><span class="nx">copy</span><span class="p">,</span>
<span class="w"> </span><span class="nx">type</span><span class="o">:</span><span class="w"> </span><span class="s1">'i64'</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>We won't go through every use of a <code>Scope</code> variable in this
post, but you can find it in the related <a href="https://github.com/eatonphil/ulisp/commit/213b83b8e952c210ba408bf38e59ae677d19e643">commit to
ulisp</a>.</p>
<h4 id="reference">Reference</h4><p>The long-term approach for handling a reference syntactically is
probably to rewrite <code>&x</code> to <code>(& x)</code> in the
parser. The lazy approach we'll take for now is to handle a reference
as a special kind of identifier in <code>compileExpression</code>.</p>
<p>We'll use the LLVM <code>alloca</code> instruction to create space on
the stack. This will return a pointer and will turn the destination
variable into a pointer type. Then we'll use <code>store</code> to set
the value at the address to the current value of the variable being
referenced.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="c1">// Is a reference, push onto the stack and return its address</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">exp</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'&'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">symbol</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb"> = alloca </span><span class="si">${</span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">'*'</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`store </span><span class="si">${</span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> %</span><span class="si">${</span><span class="nx">tmp</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> %</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>And now we're set to take the address of any code.</p>
<h3 id="system-calls">System calls</h3><p>LLVM IR provides no high-level means for making system calls. The
only way is to use inline assembly. This syntax is based on GCC inline
assembly and is confusing, with few explained examples, and unhelpful
error messages.</p>
<p>Thankfully the assembly code needed for a syscall is only one line,
one word: the <code>syscall</code> assembly instruction. We use inline
assembly variable-to-register mapping functionality to line up all the
parameters for the syscall. Here is an example:</p>
<div class="highlight"><pre><span></span><span class="nv">%result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="k">asm</span><span class="w"> </span><span class="k">sideeffect</span><span class="w"> </span><span class="s">"syscall"</span><span class="p">,</span><span class="w"> </span><span class="s">"=r,{rax},{rdi},{rsi},{rdx}"</span><span class="w"> </span><span class="p">(</span><span class="kt">i64</span><span class="w"> </span><span class="nv">%raxArg</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%rdiArg</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%rsiArg</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%rdxArg</span><span class="p">)</span>
</pre></div>
<p>This says to execute the inline assembly string,
"syscall". The <code>sideeffect</code> flag means that this assembly
should always be run even if the result isn't used. <code>=r</code>
means the inline assembly returns a value, and the rest of the string
is the list of registers that arguments should be mapped to. Finally
we call the function with all the LLVM variables we want to be mapped.</p>
<p class="note">
Eventually we should also use the inline assembly syntax to list
registers that are modified so that LLVM can know to save them
before and after.
</p><h4 id="code">Code</h4><p>We'll add a mapping for <code>syscall/sys_write</code> and a helper
function for generating syscall code using the example above as a
template. We'll suport 64-bit Darwin and Linux kernels.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALL_TABLE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">darwin</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">sys_write</span><span class="o">:</span><span class="w"> </span><span class="mh">0x2000004</span><span class="p">,</span>
<span class="w"> </span><span class="nx">sys_exit</span><span class="o">:</span><span class="w"> </span><span class="mh">0x2000001</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="w"> </span><span class="nx">linux</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">sys_write</span><span class="o">:</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span>
<span class="w"> </span><span class="nx">sys_exit</span><span class="o">:</span><span class="w"> </span><span class="mf">60</span><span class="p">,</span>
<span class="w"> </span><span class="p">},</span>
<span class="p">}[</span><span class="nx">process</span><span class="p">.</span><span class="nx">platform</span><span class="p">];</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'if'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'*'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'mul'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'%'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'urem'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'<'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp slt'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'='</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp eq'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'syscall/sys_write'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileSyscall</span><span class="p">(</span><span class="nx">SYSCALL_TABLE</span><span class="p">.</span><span class="nx">sys_write</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileSyscall</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">argTmps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">' %'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">tmp</span><span class="p">.</span><span class="nx">value</span><span class="p">;</span>
<span class="w"> </span><span class="p">}).</span><span class="nx">join</span><span class="p">(</span><span class="s1">', '</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">regs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s1">'rdi'</span><span class="p">,</span><span class="w"> </span><span class="s1">'rsi'</span><span class="p">,</span><span class="w"> </span><span class="s1">'rdx'</span><span class="p">,</span><span class="w"> </span><span class="s1">'r10'</span><span class="p">,</span><span class="w"> </span><span class="s1">'r8'</span><span class="p">,</span><span class="w"> </span><span class="s1">'r9'</span><span class="p">];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">params</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="sb">`{</span><span class="si">${</span><span class="nx">regs</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">}`</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">','</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">idTmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">().</span><span class="nx">value</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">idTmp</span><span class="si">}</span><span class="sb"> = add i64 </span><span class="si">${</span><span class="nx">id</span><span class="si">}</span><span class="sb">, 0`</span><span class="p">)</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb"> = call </span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> asm sideeffect "syscall", "=r,{rax},</span><span class="si">${</span><span class="nx">params</span><span class="si">}</span><span class="sb">,~{dirflag},~{fpsr},~{flags}" (i64 %</span><span class="si">${</span><span class="nx">idTmp</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">argTmps</span><span class="si">}</span><span class="sb">)`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h3 id="<code>></code>,-<code>/</code>"><code>></code>, <code>/</code></h3><p>Finally, we have a few new operations to add support for. But they'll
be pretty simple using the <code>compileOp</code> helper function.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'if'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'*'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'mul'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'/'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'udiv'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'%'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'urem'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'<'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp slt'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'>'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp sgt'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'='</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp eq'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'syscall/sys_write'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileSyscall</span><span class="p">(</span><span class="nx">SYSCALL_TABLE</span><span class="p">.</span><span class="nx">sys_write</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h3 id="print">print</h3><p>We're ready to give our print function a shot.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp
<span class="o">(</span>def<span class="w"> </span>print-char<span class="w"> </span><span class="o">(</span>c<span class="o">)</span>
<span class="w"> </span><span class="p">;</span><span class="w"> </span>First<span class="w"> </span>argument<span class="w"> </span>is<span class="w"> </span>stdout
<span class="w"> </span><span class="p">;</span><span class="w"> </span>Second<span class="w"> </span>argument<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>pointer<span class="w"> </span>to<span class="w"> </span>a<span class="w"> </span>char<span class="w"> </span>array<span class="w"> </span><span class="o">(</span>of<span class="w"> </span>length<span class="w"> </span>one<span class="o">)</span>
<span class="w"> </span><span class="p">;</span><span class="w"> </span>Third<span class="w"> </span>argument<span class="w"> </span>is<span class="w"> </span>the<span class="w"> </span>length<span class="w"> </span>of<span class="w"> </span>the<span class="w"> </span>char<span class="w"> </span>array
<span class="w"> </span><span class="o">(</span>syscall/sys_write<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">&</span>c<span class="w"> </span><span class="m">1</span><span class="o">))</span>
<span class="o">(</span>def<span class="w"> </span>print<span class="w"> </span><span class="o">(</span>n<span class="o">)</span>
<span class="w"> </span><span class="o">(</span><span class="k">if</span><span class="w"> </span><span class="o">(</span>><span class="w"> </span>n<span class="w"> </span><span class="m">9</span><span class="o">)</span>
<span class="w"> </span><span class="o">(</span>print<span class="w"> </span><span class="o">(</span>/<span class="w"> </span>n<span class="w"> </span><span class="m">10</span><span class="o">)))</span>
<span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="m">48</span><span class="w"> </span>is<span class="w"> </span>the<span class="w"> </span>ASCII<span class="w"> </span>code<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">'0'</span>
<span class="w"> </span><span class="o">(</span>print-char<span class="w"> </span><span class="o">(</span>+<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="o">(</span>%<span class="w"> </span>n<span class="w"> </span><span class="m">10</span><span class="o">))))</span>
<span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span>
<span class="w"> </span><span class="o">(</span>print<span class="w"> </span><span class="m">1234</span><span class="o">)</span>
<span class="w"> </span><span class="m">0</span><span class="o">)</span>
$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp
$<span class="w"> </span>./build/a.out
<span class="m">1234</span>
</pre></div>
<p>Looks good! In the next post we'll talk about tail call elimination.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">It's been a slow month for the blog. But new post on compiler basics is up! Printing integers to stdout and making syscalls in LLVM (all without arrays). This was a pre-req for playing with tail-call elimination (post coming soon) <a href="https://t.co/fDtblUZRI8">https://t.co/fDtblUZRI8</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1142808835700252678?ref_src=twsrc%5Etfw">June 23, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/compiler-basics-llvm-system-calls.htmlSat, 22 Jun 2019 00:00:00 +0000
- Writing an x86 emulator from scratch in JavaScript: 1. a stack and register machinehttp://notes.eatonphil.com/emulator-basics-a-stack-and-register-machine.html<p class="note">
Better yet, take a look at this post walking through emulating x86 ELF binaries in Go:
<br />
<a href="/emulating-amd64-starting-with-elf.html">Emulating linux/AMD64 userland: interpreting an ELF binary</a>
<br />
<br />
Next up in emulator basics:
<! forgive me, for I have sinned >
<br />
<a href="/emulator-basics-system-calls.html">2. system calls</a>
</p><p>In this post we'll create a small virtual machine in JavaScript and
use it to run a simple C program compiled with GCC for an x86_64 (or
AMD64) CPU running Linux.</p>
<p><a href="https://github.com/eatonphil/x86e">All source code is available on Github.</a></p>
<h3 id="virtual-machine-data-storage">Virtual machine data storage</h3><p>Our virtual machine will have two means of storing data: registers and
an integer stack. Each register can store a 64-bit integer. The stack
is an array of 8-bit (or 1 byte) integers.</p>
<p>We'll make the following registers available for modification and use
by the program(mer):</p>
<div class="highlight"><pre><span></span><span class="nf">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSP</span><span class="p">,</span><span class="w"> </span><span class="nb">RBP</span><span class="p">,</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RBX</span><span class="p">,</span><span class="w"> </span><span class="nb">RCX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDX</span><span class="p">,</span><span class="w"> </span><span class="nb">R8</span><span class="p">,</span><span class="w"> </span><span class="nb">R9</span><span class="p">,</span><span class="w"> </span><span class="nb">R10</span><span class="p">,</span><span class="w"> </span><span class="nb">R11</span><span class="p">,</span><span class="w"> </span><span class="nb">R12</span><span class="p">,</span><span class="w"> </span><span class="nb">R13</span><span class="p">,</span><span class="w"> </span><span class="nb">R14</span><span class="p">,</span><span class="w"> </span><span class="nb">R15</span>
</pre></div>
<p>The <code>RSP</code> register is used by the virtual machine for
tracking the location of the last entry in the stack. It will be
modified by the virtual machine when it encounters the
<code>POP</code>, <code>PUSH</code>, <code>CALL</code> and
<code>RET</code> instructions we'll support. We'll get into the
specifics shortly.</p>
<p>And we'll make the following registers available for use (but not
modification) by the program(mer):</p>
<div class="highlight"><pre><span></span><span class="nf">RIP</span><span class="p">,</span><span class="w"> </span><span class="nb">CS</span><span class="p">,</span><span class="w"> </span><span class="nb">DS</span><span class="p">,</span><span class="w"> </span><span class="nb">FS</span><span class="p">,</span><span class="w"> </span><span class="nb">SS</span><span class="p">,</span><span class="w"> </span><span class="nb">ES</span><span class="p">,</span><span class="w"> </span><span class="nb">GS</span><span class="p">,</span><span class="w"> </span><span class="nv">CF</span><span class="p">,</span><span class="w"> </span><span class="nv">ZF</span><span class="p">,</span><span class="w"> </span><span class="nv">PF</span><span class="p">,</span><span class="w"> </span><span class="nv">AF</span><span class="p">,</span><span class="w"> </span><span class="nv">SF</span><span class="p">,</span><span class="w"> </span><span class="nv">TF</span><span class="p">,</span><span class="w"> </span><span class="nv">IF</span><span class="p">,</span><span class="w"> </span><span class="nv">DF</span><span class="p">,</span><span class="w"> </span><span class="nv">OF</span>
</pre></div>
<p>Each of these has a special meaning but we'll focus on
<code>RIP</code>. The <code>RIP</code> register contains the address
of the instruction currently being interpreted by our virtual
machine. After every instruction the virtual machine will increment
the value in this register -- except for a few special instructions
like <code>CALL</code> and <code>RET</code>.</p>
<h4 id="memory-addresses">Memory addresses</h4><p>It will become useful to provide direct access to memory with a special
syntax. We'll focus just on 64-bit addresses that will look like this:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="kt">QWORD</span><span class="w"> </span><span class="nv">PTR</span><span class="w"> </span><span class="p">[</span><span class="nb">RBP</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="mi">12</span>
</pre></div>
<p>This asks for the value <code>12</code> to be written into the memory
address at <code>RBP - 8</code> bytes. The <code>QWORD PTR</code> part
clarifies that we want to write 8 bytes worth of the value. Since
<code>12</code> is less than 8 bytes, the rest will be filled with
zeros.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="kt">QWORD</span><span class="w"> </span><span class="nv">PTR</span><span class="w"> </span><span class="p">[</span><span class="nb">RBP</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">8</span><span class="p">]</span>
</pre></div>
<p>This asks for eight bytes starting from the memory address <code>RBP -
8</code> to be added to the value in <code>RAX</code> and stored back
in <code>RAX</code>.</p>
<h3 id="virtual-machine-instruction-set">Virtual machine instruction set</h3><p>In our virtual machine we'll define support for the following instructions:</p>
<ul>
<li><code>MOV $REGISTER, $REGISTER or $MEMORY ADDRESS or $LITERAL NUMBER</code><ul>
<li>This instruction copies the second value into the first.</li>
</ul>
</li>
<li><code>ADD $REGISTER, $REGISTER or $MEMORY ADDRESS</code><ul>
<li>This instruction adds the second value into the first and stores the result into the first.</li>
</ul>
</li>
<li><code>PUSH $REGISTER</code><ul>
<li>This instruction will decrement the <code>RSP</code> register by 8 bytes and store the value at the bottom of the stack.</li>
</ul>
</li>
<li><code>POP $REGISTER</code><ul>
<li>This instruction will increment the <code>RSP</code> register by 8 bytes, remove the last element in the stack (at the bottom), and store it into the register.</li>
</ul>
</li>
<li><code>CALL $LABEL</code><ul>
<li>This instruction will push the value in the <code>RIP</code> register (plus one) onto the stack and set the <code>RIP</code> register to the line of code of the label. More on this later.</li>
</ul>
</li>
<li><code>RET</code><ul>
<li>This instruction will remove the value at the bottom of the stack and store it in the <code>RIP</code> register.</li>
</ul>
</li>
</ul>
<p>Now we have more than enough instructions to write some interesting
programs for the virtual machine.</p>
<h3 id="virtual-machine-semantics">Virtual machine semantics</h3><p>We'll make one last assumption before explaining further. In our
programs, there must be a <code>main</code> label which must contain
a <code>RET</code> instruction. Once we hit the terminal
<code>RET</code>, we will exit the virtual machine and set the exit
code to the value stored in the <code>RAX</code> register.</p>
<p>Let's look at a simple program:</p>
<div class="highlight"><pre><span></span><span class="nl">main:</span><span class="w"> </span><span class="c1">; the required main label</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; store 1 in RAX</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="c1">; store 2 in RDI</span>
<span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span><span class="w"> </span><span class="c1">; store the result of adding RAX and RDI in RAX</span>
<span class="w"> </span><span class="nf">RET</span><span class="w"> </span><span class="c1">; give control back to the virtual machine</span>
</pre></div>
<p>When we run this program, first we initialize a stack (we'll give it
1000 elements) and set the <code>RSP</code> register to 1000 (the top
of the stack). Then we look for the <code>main</code> label and set
the <code>RIP</code> register to 1, the line number after the label
appears (0). Then until the <code>RIP</code> register is 1000 again,
we interpret the instruction at the line number stored in the
<code>RIP</code> register. Once the <code>RIP</code> register hits
1000, we exit the program setting <code>RAX</code> as the exit code.</p>
<h4 id="one-more-example">One more example</h4><p>Now let's look at one more program:</p>
<div class="highlight"><pre><span></span><span class="nl">plus:</span>
<span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">RET</span>
<span class="nl">main:</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span>
<span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span>
<span class="w"> </span><span class="nf">RET</span>
</pre></div>
<p>Our virtual machine will start at the line after the
<code>main</code> label. Then it will store <code>1</code> into
<code>RDI</code> and <code>2</code> into <code>RSI</code>. Then it
will jump to the second line in the program to add <code>RDI</code>
and <code>RSI</code> and store the result in <code>RDI</code>. Then it
will copy <code>RDI</code> into <code>RAX</code> and return control to
the final line. This last <code>RET</code> will in turn return control
to the virtual machine. Then the program will exit with exit code
<code>3</code>.</p>
<h3 id="parsing">Parsing</h3><p>Now that we've finished up describing our virtual machine language and
semantics, we need to parse the instructions into a format we can
easily interpret.</p>
<p>To do this we'll iterate over the program skip any lines that start
with a dot. These are virtual machine directives that are important
for us to ignore for now. We'll also remove any characters including
and following a semi-colon or hash-tag, until the end of the
line. These are comments.</p>
<p>We'll store a dictionary of label names to line numbers (the line
number of the label plus one) and without the colon.</p>
<p>And we'll store the instructions as an array of objects composed of an
operation and optional operands.</p>
<h4 id="code">Code</h4><div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'\n'</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span>
<span class="w"> </span><span class="c1">// TODO: handle each line</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">labels</span><span class="p">,</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>First let's handle the directives we want to ignore:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'.'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And then comments:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">';'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'#'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">';'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">';'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">'#'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'#'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And then labels:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">';'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'#'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">';'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">';'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">'#'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'#'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">':'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="nx">labels</span><span class="p">[</span><span class="nx">label</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">instructions</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>And finally instruction parsing plus the rest:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'\n'</span><span class="p">);</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'.'</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">';'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">';'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">';'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">':'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="nx">labels</span><span class="p">[</span><span class="nx">label</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">instructions</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="sr">/\s/</span><span class="p">)[</span><span class="mf">0</span><span class="p">].</span><span class="nx">toLowerCase</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operands</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">operation</span><span class="p">.</span><span class="nx">length</span><span class="p">).</span><span class="nx">split</span><span class="p">(</span><span class="s1">','</span><span class="p">).</span><span class="nx">map</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">trim</span><span class="p">());</span>
<span class="w"> </span><span class="nx">instructions</span><span class="p">.</span><span class="nx">push</span><span class="p">({</span>
<span class="w"> </span><span class="nx">operation</span><span class="p">,</span>
<span class="w"> </span><span class="nx">operands</span><span class="p">,</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">labels</span><span class="p">,</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>Hurray! A brittle parser.</p>
<h3 id="interpreting">Interpreting</h3><p>We've already described the semantics a few times. So let's get
started with the foundation and initialization.</p>
<p>We'll use
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt">BigInt</a>s
because JavaScript integers are 53-bits wide. This isn't incredibly
important in our basic programs but it will quickly became painful
without.</p>
<p>And we'll make process memory available as an array of 8-bit integers.
In order to make this easy to use, we'll also provide helper function
for writing to and reading from memory.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">REGISTERS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s1">'RDI'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RSI'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RSP'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RBP'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RBX'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RCX'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RDX'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RIP'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R8'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'R9'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R10'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R11'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R12'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R13'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R14'</span><span class="p">,</span><span class="w"> </span><span class="s1">'R15'</span><span class="p">,</span><span class="w"> </span><span class="s1">'CS'</span><span class="p">,</span><span class="w"> </span><span class="s1">'DS'</span><span class="p">,</span><span class="w"> </span><span class="s1">'FS'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'SS'</span><span class="p">,</span><span class="w"> </span><span class="s1">'ES'</span><span class="p">,</span><span class="w"> </span><span class="s1">'GS'</span><span class="p">,</span><span class="w"> </span><span class="s1">'CF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'ZF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'PF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'AF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'SF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'TF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'IF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'DF'</span><span class="p">,</span><span class="w"> </span><span class="s1">'OF'</span><span class="p">,</span>
<span class="p">];</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0n</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">size</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">>>=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">memory</span><span class="p">[</span><span class="nx">address</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="mh">0xFFn</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0n</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0n</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">size</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">memory</span><span class="p">[</span><span class="nx">address</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="mi">0n</span><span class="p">)</span><span class="w"> </span><span class="o"><<</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">8n</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO: interpret</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">file</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">memory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="mf">10000</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">file</span><span class="p">).</span><span class="nx">toString</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">instructions</span><span class="p">,</span><span class="w"> </span><span class="nx">labels</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">code</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">registers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">rs</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">({</span><span class="w"> </span><span class="p">...</span><span class="nx">rs</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="nx">r</span><span class="p">]</span><span class="o">:</span><span class="w"> </span><span class="mi">0n</span><span class="w"> </span><span class="p">}),</span><span class="w"> </span><span class="p">{});</span>
<span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">labels</span><span class="p">.</span><span class="nx">main</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="nx">labels</span><span class="p">.</span><span class="nx">_main</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nx">labels</span><span class="p">.</span><span class="nx">main</span><span class="p">);</span>
<span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">memory</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">process</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">registers</span><span class="p">,</span>
<span class="w"> </span><span class="nx">memory</span><span class="p">,</span>
<span class="w"> </span><span class="nx">instructions</span><span class="p">,</span>
<span class="w"> </span><span class="nx">labels</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span>
<span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">process</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RAX</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">process</span><span class="p">.</span><span class="nx">exit</span><span class="p">(</span><span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mf">2</span><span class="p">]));</span>
</pre></div>
<p>We'll accept <code>_main</code> as an entry point as well as
<code>main</code> to support our macOS users. If you know why our
macOS users use <code>_main</code> I'd love to know.</p>
<p>To interpret, we grab the instruction pointed to in <code>RIP</code>
and switch on the operation.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">do</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">instruction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">instructions</span><span class="p">[</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="p">];</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operation</span><span class="p">.</span><span class="nx">toLowerCase</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'mov'</span><span class="o">:</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'add'</span><span class="o">:</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'call'</span><span class="o">:</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'ret'</span><span class="o">:</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'push'</span><span class="o">:</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'pop'</span><span class="o">:</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">memory</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">8</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">)));</span>
<span class="p">}</span>
</pre></div>
<h4 id="interpreting-mov">Interpreting MOV</h4><p>Example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="kt">QWORD</span><span class="w"> </span><span class="nv">PTR</span><span class="w"> </span><span class="p">[</span><span class="nb">RBP</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="mi">8</span>
</pre></div>
<p>This instruction will store a value into a register or address and
increment <code>RIP</code>. If the left-hand side is a memory address
we will write to memory.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'mov'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">rhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">1</span><span class="p">]);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">lhs</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">lhs</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">rhs</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">lhs</span><span class="p">.</span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">rhs</span><span class="p">,</span><span class="w"> </span><span class="nx">lhs</span><span class="p">.</span><span class="nx">size</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<p>We're delegating to a helper function to handle registers vs. memory
addresses vs. literals appropriately. Without memory addresses it's a
simple function:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">value</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">lhs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">value</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">.</span><span class="nx">asIntN</span><span class="p">(</span><span class="mf">64</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>We need to do some hacking to support memory addresses:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">value</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">lhs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">value</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">'QWORD PTR ['</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">offsetString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="s1">'QWORD PTR ['</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">trim</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">offsetString</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">'-'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offsetString</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'-'</span><span class="p">).</span><span class="nx">map</span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">trim</span><span class="p">()));</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">l</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">r</span><span class="p">;</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">8</span><span class="p">;</span><span class="w"> </span><span class="c1">// qword is 8 bytes</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">lhs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="o">:</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Unsupported offset calculation: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">value</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">.</span><span class="nx">asIntN</span><span class="p">(</span><span class="mf">64</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h4 id="interpreting-add">Interpreting ADD</h4><p>Example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span>
</pre></div>
<p>This instruction will combine both registers and store the result in
the first, then increment the <code>RIP</code> register.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'add'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">rhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">1</span><span class="p">]);</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">lhs</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">rhs</span><span class="p">;</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="interpreting-call">Interpreting CALL</h4><p>Example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span>
</pre></div>
<p>This instruction store <code>RIP</code> (plus one, to continue after
the call instruction) on the stack and sets <code>RIP</code> to the
location specified by the label.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'call'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span>
<span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1n</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">];</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">labels</span><span class="p">[</span><span class="nx">label</span><span class="p">];</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="interpreting-ret">Interpreting RET</h4><p>Example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">RET</span>
</pre></div>
<p>This instruction removes the last element from the stack and stores it
in the <code>RIP</code> register.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'ret'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="interpreting-push">Interpreting PUSH</h4><p>Example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RAX</span>
</pre></div>
<p>This instruction stores the value in the register on the stack and
increments <code>RIP</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'push'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">]);</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span>
<span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h4 id="interpreting-pop">Interpreting POP</h4><p>Example:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RAX</span>
</pre></div>
<p>This instruction removes the last element from the stack and stores it
into the register specified. Then it increments <code>RIP</code>.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'pop'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">lhs</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span>
<span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
</pre></div>
<h3 id="all-together">All together</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test1.asm
main:<span class="w"> </span><span class="p">;</span><span class="w"> </span>the<span class="w"> </span>required<span class="w"> </span>main<span class="w"> </span>label
<span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>store<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>RAX
<span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>store<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>RDI
<span class="w"> </span>ADD<span class="w"> </span>RAX,<span class="w"> </span>RDI<span class="w"> </span><span class="p">;</span><span class="w"> </span>store<span class="w"> </span>the<span class="w"> </span>result<span class="w"> </span>of<span class="w"> </span>adding<span class="w"> </span>RAX<span class="w"> </span>and<span class="w"> </span>RDI<span class="w"> </span><span class="k">in</span><span class="w"> </span>RAX
<span class="w"> </span>RET<span class="w"> </span><span class="p">;</span><span class="w"> </span>give<span class="w"> </span>control<span class="w"> </span>back<span class="w"> </span>to<span class="w"> </span>the<span class="w"> </span>virtual<span class="w"> </span>machine
$<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>test1.asm
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">3</span>
</pre></div>
<p>And finally, let's see what we can do with a simple C program:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>plus.c
long<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span>
<span class="w"> </span>long<span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">;</span>
<span class="w"> </span>long<span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span>a<span class="w"> </span>+<span class="w"> </span>b<span class="p">;</span>
<span class="o">}</span>
$<span class="w"> </span>gcc<span class="w"> </span>-S<span class="w"> </span>-masm<span class="o">=</span>intel<span class="w"> </span>-o<span class="w"> </span>plus.s<span class="w"> </span>plus.c
$<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>plus.s
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">11</span>
</pre></div>
<p>And we've got the start of a working x86_64/AMD64 emulator.</p>
<h3 id="next-steps">Next steps</h3><p>We aren't setting flags appropriately to support conditionals, so
that's low-hanging fruit. Additionally, syscalls open up a new world
(that we'll end up needing since exit codes are limited to 8-bits of
information). Additionally, our parsing is brittle. Dealing with ELF
files may be a better direction to go and also enables more. We'll
explore these aspects and others in follow-up posts.</p>
<h3 id="human-interest">Human interest</h3><p>I originally intended to build a GameBoy emulator because the hardware
is simple and uniform. But I found it easiest to start hacking
together an AMD64 emulator because AMD64 is well-documented and gcc is
easy enough to use. I'm still interested though unless/until I figure
out how to emulate a graphics card for AMD64.</p>
<p>It's tricky! But not that tricky. I built a <a href="https://github.com/eatonphil/x86e">graphical
debugger</a> around this emulator to
help out with the logic and off-by-one errors. But ultimately it's
been surprising to me how easy it is to get started -- especially when
I'm not concerned about absolute correctness (yet).</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here's my first post on a series on emulator basics. It's baby's first stack and register virtual machine and it turns out it runs x86 code. <a href="https://t.co/WiWmGedawt">https://t.co/WiWmGedawt</a> <a href="https://twitter.com/hashtag/linux?src=hash&ref_src=twsrc%5Etfw">#linux</a> <a href="https://twitter.com/hashtag/assembly?src=hash&ref_src=twsrc%5Etfw">#assembly</a> <a href="https://t.co/xjiMkhgpdN">https://t.co/xjiMkhgpdN</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1132036835964870657?ref_src=twsrc%5Etfw">May 24, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/emulator-basics-a-stack-and-register-machine.htmlTue, 21 May 2019 00:00:00 +0000
- Tail call eliminationhttp://notes.eatonphil.com/tail-call-elimination.html<p>In this post we'll explore what tail calls are, why they are useful,
and how they can be eliminated in an interpreter, a compiler targeting
C++, and a compiler targeting LLVM IR.</p>
<h3 id="tail-calls">Tail calls</h3><p>A tail call is a function call made at the end of a block that
returns the value of the call (some languages do not force this
<code>return</code> requirement). Here are a few examples.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx1</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Loops forever but is a tail call.</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tailCallEx1</span><span class="p">();</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx3</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">tailCallEx</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx4</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="mf">0</span><span class="o">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tailCallEx4</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And here are some examples of non-tail calls.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">nonTailCallEx1</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Not a tail call because the call is not the value returned.</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">nonTailCallEx1</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">nonTailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">nonTailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Not a tail call because the value is not *immediately* returned.</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">r</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<h3 id="why-is-this-important?">Why is this important?</h3><p>Some languages can rewrite a recursive tail call as a jump/branch/goto instead
of a function call. This allows:</p>
<ol>
<li>Potential performance gain if function calls have large overhead</li>
<li>No stack overflows due to no nested function call stacks</li>
</ol>
<h3 id="implementation-1:-interpreter">Implementation 1: Interpreter</h3><p>Given a tail call recursive fibonacci:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fibonacci</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fibonacci</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Here is how we could transform (by hand) this without a tail call.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fibonacci</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">a1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">b1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">n1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">a1</span><span class="p">;</span>
<span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">b1</span><span class="p">;</span>
<span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">n1</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>If this was written in a language with labels and goto we could
simplify the code slightly by doing that. But it is the same effect as
a loop.</p>
<p>Since we're in an interpreter (that isn't JIT compiling), we cannot
pick between these two and must merge them. So we put all function
bodies in a loop and break if it isn't a tail call. Otherwise we line
up the paremeters and let the loop take us back.</p>
<p>Here is an example of this strategy used in a <a href="https://github.com/eatonphil/bsdscheme">Scheme
interpreter</a> written in D.</p>
<div class="highlight"><pre><span></span><span class="c1">// Define a new function with name `name` and add it to the context.</span>
<span class="n">Value</span><span class="w"> </span><span class="n">namedLambda</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="n">Context</span><span class="w"> </span><span class="n">ctx</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">funArguments</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">car</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">funBody</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">cdr</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">defined</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">parameters</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="p">**</span><span class="w"> </span><span class="n">rest</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Context</span><span class="w"> </span><span class="n">newCtx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">ctx</span><span class="p">.</span><span class="n">dup</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Copy the runtime calling context to the new context.</span>
<span class="w"> </span><span class="n">Context</span><span class="w"> </span><span class="n">runtimeCtx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">cast</span><span class="p">(</span><span class="n">Context</span><span class="p">)(*</span><span class="n">rest</span><span class="p">);</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">runtimeCallingContext</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">runtimeCtx</span><span class="p">.</span><span class="n">callingContext</span><span class="p">;</span>
<span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">callingContext</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">runtimeCallingContext</span><span class="p">.</span><span class="n">dup</span><span class="p">;</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="w"> </span><span class="c1">// Loop forever, will break immediately if not a tail call</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">tailCalling</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">valueIsList</span><span class="p">(</span><span class="n">funArguments</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">keyTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">funArguments</span><span class="p">);</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">valueTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">parameters</span><span class="p">);</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">keyTmp</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueTmp</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// TODO: handle arg count mismatch</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">valueIsList</span><span class="p">(</span><span class="n">keyTmp</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">keyTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">keyTmp</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="w"> </span><span class="n">valueTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">valueTmp</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">valueIsSymbol</span><span class="p">(</span><span class="n">funArguments</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">funArguments</span><span class="p">);</span>
<span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">car</span><span class="p">(</span><span class="n">parameters</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(!</span><span class="n">valueIsNil</span><span class="p">(</span><span class="n">funArguments</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">error</span><span class="p">(</span><span class="s">"Expected symbol or list in lambda formals"</span><span class="p">,</span><span class="w"> </span><span class="n">funArguments</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(!</span><span class="n">tailCalling</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">callingContext</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Tuple</span><span class="p">!(</span><span class="nb">string</span><span class="p">,</span><span class="w"> </span><span class="n">Delegate</span><span class="p">)(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="p">&</span><span class="n">defined</span><span class="p">));</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">eval</span><span class="p">(</span><span class="n">withBegin</span><span class="p">(</span><span class="n">funBody</span><span class="p">),</span><span class="w"> </span><span class="k">cast</span><span class="p">(</span><span class="kt">void</span><span class="p">**)[</span><span class="n">newCtx</span><span class="p">]);</span>
<span class="w"> </span><span class="c1">// In a tail call, let the loop carry us back.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">newCtx</span><span class="p">.</span><span class="n">doTailCall</span><span class="w"> </span><span class="p">==</span><span class="w"> </span><span class="p">&</span><span class="n">defined</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">tailCalling</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="n">parameters</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">doTailCall</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"> </span><span class="c1">// Not in a tail call, we're done a regular function call.</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">makeFunctionValue</span><span class="p">(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="p">&</span><span class="n">defined</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p class="note">
We can not eliminate mutually recursive tail calls with this
method. We could use continuation-passing style but that would not
have addressed the concern: not making a function call.
</p><h3 id="implementation-2:-compiling-to-c++">Implementation 2: Compiling to C++</h3><p>The strategy here is the same as in the interpreter except for that
since tail call recursive functions are known at compile time, we can
generate non-generalized code in function bodies.</p>
<p>Here is how a <a href="https://github.com/eatonphil/jsc">JavaScript compiler</a>
transforms the above fibonacci implementation into C++:</p>
<div class="highlight"><pre><span></span><span class="nb nb-Type">void</span><span class="w"> </span><span class="n">tco_fib</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="o">&</span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="w"> </span><span class="o">*</span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="o">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">tco_n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
<span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">tco_a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">tco_b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="n">tail_recurse_1</span><span class="p">:</span>
<span class="w"> </span><span class="p">;</span>
<span class="w"> </span><span class="nb nb-Type">bool</span><span class="w"> </span><span class="n">sym_if_test_58</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">tco_n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_if_test_58</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">args</span><span class="o">.</span><span class="n">GetReturnValue</span><span class="p">()</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">tco_a</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nb nb-Type">bool</span><span class="w"> </span><span class="n">sym_if_test_70</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">tco_n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_if_test_70</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">args</span><span class="o">.</span><span class="n">GetReturnValue</span><span class="p">()</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">tco_b</span><span class="p">));</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_83</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">tco_n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_92</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">tco_a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">tco_b</span><span class="p">));</span>
<span class="w"> </span><span class="n">tco_n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">sym_arg_83</span><span class="p">);</span>
<span class="w"> </span><span class="n">tco_a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tco_b</span><span class="p">;</span>
<span class="w"> </span><span class="n">tco_b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">sym_arg_92</span><span class="p">);</span>
<span class="w"> </span><span class="n">goto</span><span class="w"> </span><span class="n">tail_recurse_1</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>This is implemented by checking every function call. If the function
call is in tail call position, we generate code for jumping to the
beginning of the function. Otherwise, we generate a call as usual.</p>
<p>Here is how the tail call check and code-generation is done in the
<a href="https://github.com/eatonphil/jsc/blob/master/src/compile/compile.ts#L186">compiler</a>:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span>
<span class="w"> </span><span class="nx">context</span><span class="o">:</span><span class="w"> </span><span class="kt">Context</span><span class="p">,</span>
<span class="w"> </span><span class="nx">destination</span><span class="o">:</span><span class="w"> </span><span class="kt">Local</span><span class="p">,</span>
<span class="w"> </span><span class="nx">ce</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.CallExpression</span><span class="p">,</span>
<span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">tcoLabel</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">tcoParameters</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">identifier</span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">locals</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">mangle</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="kr">module</span><span class="nx">Name</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">safe</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tco</span><span class="p">[</span><span class="nx">safe</span><span class="p">.</span><span class="nx">getCode</span><span class="p">()])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">safe</span><span class="p">.</span><span class="nx">getCode</span><span class="p">();</span>
<span class="w"> </span><span class="nx">tcoLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tco</span><span class="p">[</span><span class="nx">safeName</span><span class="p">].</span><span class="nx">label</span><span class="p">;</span>
<span class="w"> </span><span class="nx">tcoParameters</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tco</span><span class="p">[</span><span class="nx">safeName</span><span class="p">].</span><span class="nx">parameters</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">identifier</span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">mangled</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">mangle</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="kr">module</span><span class="nx">Name</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">locals</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">mangled</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">safe</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">tcoLabel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">compileParameter</span><span class="p">(</span>
<span class="w"> </span><span class="nx">context</span><span class="p">,</span>
<span class="w"> </span><span class="nx">tcoParameters</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span>
<span class="w"> </span><span class="nx">i</span><span class="p">,</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span>
<span class="w"> </span><span class="nx">arg</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">emitStatement</span><span class="p">(</span><span class="sb">`goto </span><span class="si">${</span><span class="nx">tcoLabel</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="s1">''</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">);</span>
<span class="w"> </span><span class="nx">destination</span><span class="p">.</span><span class="nx">tce</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="c1">// Otherwise generate regular function call</span>
</pre></div>
<p>This requires you to have been building up the state throughout the
AST to know whether or not any particular call is in tail position.</p>
<h3 id="implementation-3:-compiling-to-llvm-ir">Implementation 3: Compiling to LLVM IR</h3><p>LLVM IR is the most boring because all you do is mark any tail call as
being a tail call. Then so long as the call meets some
<a href="https://llvm.org/docs/LangRef.html#id320">requirements</a>, the key one
being that the result of the call must be returned immediately, LLVM
will generate a jump instead of a call for you.</p>
<p>Given the following lisp-y implementation of the same tail call recursive
fibonacci function (compiler <a href="https://github.com/eatonphil/ulisp">here</a>):</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="nv">n</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="nv">a</span>
<span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">1</span><span class="p">))))</span>
</pre></div>
<p>We generate the following LLVM IR:</p>
<div class="highlight"><pre><span></span><span class="k">define</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="vg">@fib</span><span class="p">(</span><span class="kt">i64</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">%ifresult13</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">alloca</span><span class="w"> </span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
<span class="w"> </span><span class="nv">%sym14</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym15</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym12</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">eq</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym14</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym15</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">i1</span><span class="w"> </span><span class="nv">%sym12</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iftrue16</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iffalse17</span>
<span class="nl">iftrue16:</span>
<span class="w"> </span><span class="nv">%sym18</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="k">store</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym18</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="p">*</span><span class="w"> </span><span class="nv">%ifresult13</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifend19</span>
<span class="nl">iffalse17:</span>
<span class="w"> </span><span class="nv">%sym21</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym23</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym24</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym22</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym23</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym24</span>
<span class="w"> </span><span class="nv">%sym26</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym27</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym25</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">sub</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym26</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym27</span>
<span class="w"> </span><span class="c">; NOTE the `tail` before `call` here</span>
<span class="w"> </span><span class="nv">%sym20</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">tail</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="vg">@fib</span><span class="p">(</span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym21</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym22</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym25</span><span class="p">)</span>
<span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym20</span>
<span class="w"> </span><span class="k">store</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym20</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="p">*</span><span class="w"> </span><span class="nv">%ifresult13</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifend19</span>
<span class="nl">ifend19:</span>
<span class="w"> </span><span class="nv">%sym11</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">load</span><span class="w"> </span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="p">*</span><span class="w"> </span><span class="nv">%ifresult13</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
<span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym11</span>
<span class="p">}</span>
</pre></div>
<p>The only difference between supporting tail call elimination in is
whether the <code>call</code> instruction is preceeded by a
<code>tail</code> directive. That makes the
<a href="https://github.com/eatonphil/ulisp/blob/master/src/backend/llvm.js#L198">implementation</a>
very simple:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">isTailCall</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">TAIL_CALL_ENABLED</span><span class="w"> </span><span class="o">&&</span>
<span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tailCallTree</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">validFunction</span><span class="p">.</span><span class="nx">value</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">maybeTail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">isTailCall</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">'tail '</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb"> = </span><span class="si">${</span><span class="nx">maybeTail</span><span class="si">}</span><span class="sb">call </span><span class="si">${</span><span class="nx">validFunction</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> @</span><span class="si">${</span><span class="nx">validFunction</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">(</span><span class="si">${</span><span class="nx">safeArgs</span><span class="si">}</span><span class="sb">)`</span><span class="p">);</span>
<span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">isTailCall</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`ret </span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> %</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h4 id="generated-assembly">Generated assembly</h4><p>The resulting generated code (run through
<a href="https://llvm.org/docs/CommandGuide/llc.html">llc</a>) for that call will
be:</p>
<div class="highlight"><pre><span></span><span class="na">...</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="no">rax</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span>
<span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="no">rdx</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rdi</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rsi</span><span class="p">,</span><span class="w"> </span><span class="no">rax</span>
<span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="no">_fib</span><span class="w"> </span><span class="c1">## TAILCALL</span>
<span class="na">...</span>
</pre></div>
<p>And if tail call elimination is disabled:</p>
<div class="highlight"><pre><span></span><span class="na">...</span>
<span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="no">rax</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span>
<span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="no">rdx</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rdi</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span>
<span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rsi</span><span class="p">,</span><span class="w"> </span><span class="no">rax</span>
<span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="no">_fib</span>
<span class="na">...</span>
</pre></div>
<h3 id="summary">Summary</h3><p>The last bit I haven't covered is how you track whether or not a call
is in tail position. That is difficult to cover in a blog post because
it's a matter of you propagating/not propagating at each syntax node
type. But generally speaking, if the syntax node is not in tail
position (e.g. not the last expression in a block), you drop the tail
state you've built up. When you make a function call, you add the
function name to the tail state.</p>
<p>But I will be covering this in detail in the LLVM case in the next
post in my <a href="http://notes.eatonphil.com/compiler-basics-llvm-conditionals.html">compiler
basics</a>
series.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Put together a survey and summary of tail call elimination, the effect and implementation, in an interpreter, a compiler targeting C++, and a compiler targeting LLVM IR. <a href="https://t.co/pXiLoXjw2u">https://t.co/pXiLoXjw2u</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1128640717679734784?ref_src=twsrc%5Etfw">May 15, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/tail-call-elimination.htmlTue, 14 May 2019 00:00:00 +0000
- Writing a lisp compiler from scratch in JavaScript: 4. LLVM conditionals and compiling fibonaccihttp://notes.eatonphil.com/compiler-basics-llvm-conditionals.html<p class="note">
Previously in compiler basics:
<! forgive me, for I have sinned >
<br />
<a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a>
<br />
<a href="/compiler-basics-functions.html">2. user-defined functions and variables</a>
<br />
<a href="/compiler-basics-llvm.html">3. LLVM</a>
<br />
Next in compiler basics:
<br />
<a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a>
<br />
<a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a>
</p><p>In this post we'll extend the
<a href="https://github.com/eatonphil/ulisp">compiler</a>'s LLVM backend to
support compiling conditionals such that we can support an
implementation of the fibonacci algorithm.</p>
<p>Specifically we're aiming for the following:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/fib.lisp
<span class="o">(</span>def<span class="w"> </span>fib<span class="w"> </span><span class="o">(</span>n<span class="o">)</span>
<span class="w"> </span><span class="o">(</span><span class="k">if</span><span class="w"> </span><span class="o">(</span><<span class="w"> </span>n<span class="w"> </span><span class="m">2</span><span class="o">)</span>
<span class="w"> </span>n
<span class="w"> </span><span class="o">(</span>+<span class="w"> </span><span class="o">(</span>fib<span class="w"> </span><span class="o">(</span>-<span class="w"> </span>n<span class="w"> </span><span class="m">1</span><span class="o">))</span><span class="w"> </span><span class="o">(</span>fib<span class="w"> </span><span class="o">(</span>-<span class="w"> </span>n<span class="w"> </span><span class="m">2</span><span class="o">)))))</span>
<span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span>
<span class="w"> </span><span class="o">(</span>fib<span class="w"> </span><span class="m">8</span><span class="o">))</span>
$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp
$<span class="w"> </span>./build/prog
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">21</span>
</pre></div>
<p>To do this we'll have to add the <code><</code>, <code>-</code> and
<code>if</code> built-ins.</p>
<p><a href="https://github.com/eatonphil/ulisp">All source code is available on Github</a>.</p>
<h3 id="subtraction">Subtraction</h3><p>This is the easiest to add since we already support addition. They are
both arithmetic operations that produce an integer. We simply add a
mapping of <code>-</code> to the LLVM instruction <code>sub</code> so
our LLVM backend constructor (<code>src/backends/llvm.js</code>) looks
like this:</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'if'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="p">...</span>
</pre></div>
<h3 id="less-than">Less than</h3><p>The <code><</code> builtin is a logical operation. These are handled
differently from arithmetic operations in LLVM IR. A logical operation
looks like this:</p>
<div class="highlight"><pre><span></span><span class="nv nv-Anonymous">%3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">slt</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv nv-Anonymous">%1</span><span class="p">,</span><span class="w"> </span><span class="nv nv-Anonymous">%2</span>
</pre></div>
<p>This says that we're doing an integer comparison, <code>icmp</code>,
(with signed less than, <code>slt</code>) on the <code>i32</code>
integers in variables <code>%1</code> and <code>%2</code>.</p>
<p>We can shim this into our existing <code>compileOp</code> helper like
so:</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'if'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'<'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp slt'</span><span class="p">),</span>
<span class="p">...</span>
</pre></div>
<h3 id="conditionals">Conditionals</h3><p>The last part we need to add is support for conditional execution of
code at runtime. Assembly-like languages handle this with "jumps" and
"labels". Jumping causes execution to continue at the address being
jumped to (instead of just the line following the jump
instruction). Labels give you a way of naming an address instead of
having to calculate it yourself. Our code will look vaguely like this:</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nv">%test</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">slt</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="nv nv-Anonymous">%1</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">i1</span><span class="w"> </span><span class="nv">%test</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iftrue</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iffalse</span>
<span class="nl">iftrue:</span>
<span class="w"> </span><span class="c">; do true stuff</span>
<span class="nl">iffalse:</span>
<span class="w"> </span><span class="c">; do false stuff</span>
<span class="w"> </span><span class="c">; do next stuff</span>
</pre></div>
<p>The <code>br</code> instruction can jump (or branch) conditionally or
unconditionally. This snippet demonstrates a conditional jump.</p>
<p>But there are a few things wrong with this pseudo-code. First off if
the condition is true, execution will just continue on into the false
section once finished. Second, LLVM IR actually requires all labels to
end with a branch instruction. So we'll add a new label after the true
and false section called <code>ifresult</code> and jump to it
unconditionally after both.</p>
<div class="highlight"><pre><span></span><span class="w"> </span><span class="nv">%test</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">slt</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="nv nv-Anonymous">%1</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">i1</span><span class="w"> </span><span class="nv">%test</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iftrue</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iffalse</span>
<span class="nl">iftrue:</span>
<span class="w"> </span><span class="c">; do true stuff</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifresult</span>
<span class="nl">iffalse:</span>
<span class="w"> </span><span class="c">; do false stuff</span>
<span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifresult</span>
<span class="nl">ifresult:</span>
<span class="w"> </span><span class="c">; do next stuff</span>
</pre></div>
<h3 id="scope">Scope</h3><p>One last thing we'll need to do before implementing the code
generation for this is to update our <code>Scope</code> class to
accept symbol prefixes so we can pass our labels through Scope to make
sure they are unique but still have useful names.</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'sym'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">nth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">nth</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">...</span>
</pre></div>
<h3 id="compileif">compileIf</h3><p>Now we can add a primitive function mapping <code>if</code> to a new
<code>compileIf</code> helper and implement the helper.</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'<'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'icmp slt'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'if'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="p">...</span>
<span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">elseBlock</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">testVariable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Compile expression and branch</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">testVariable</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'iftrue'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'iffalse'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`br i1 %</span><span class="si">${</span><span class="nx">testVariable</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">trueLabel</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">falseLabel</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile true section</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'ifend'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'br label %'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile false section</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">elseBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'br label %'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile cleanup</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">...</span>
</pre></div>
<p>Note that this code generation sends the <code>destination<code>
variable into both the true and false sections. Let's try it out.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp
llc:<span class="w"> </span>error:<span class="w"> </span>llc:<span class="w"> </span>build/prog.ll:19:3:<span class="w"> </span>error:<span class="w"> </span>multiple<span class="w"> </span>definition<span class="w"> </span>of<span class="w"> </span><span class="nb">local</span><span class="w"> </span>value<span class="w"> </span>named<span class="w"> </span><span class="s1">'sym5'</span>
<span class="w"> </span>%sym5<span class="w"> </span><span class="o">=</span><span class="w"> </span>add<span class="w"> </span>i32<span class="w"> </span>%sym15,<span class="w"> </span>%sym16
<span class="w"> </span>^
child_process.js:665
<span class="w"> </span>throw<span class="w"> </span>err<span class="p">;</span>
<span class="w"> </span>^
Error:<span class="w"> </span>Command<span class="w"> </span>failed:<span class="w"> </span>llc<span class="w"> </span>-o<span class="w"> </span>build/prog.s<span class="w"> </span>build/prog.ll
llc:<span class="w"> </span>error:<span class="w"> </span>llc:<span class="w"> </span>build/prog.ll:19:3:<span class="w"> </span>error:<span class="w"> </span>multiple<span class="w"> </span>definition<span class="w"> </span>of<span class="w"> </span><span class="nb">local</span><span class="w"> </span>value<span class="w"> </span>named<span class="w"> </span><span class="s1">'sym5'</span>
<span class="w"> </span>%sym5<span class="w"> </span><span class="o">=</span><span class="w"> </span>add<span class="w"> </span>i32<span class="w"> </span>%sym15,<span class="w"> </span>%sym16
</pre></div>
<p>That's annoying. An unfortunate aspect of LLVM's required
single-static assignment form is that you cannot reuse variable names
within a function even if it is not possible for the variable to be
actually reused.</p>
<p>To work around this we need to allocate memory on the stack, store the
result in each true/false section in this location, and read from this
location afterward to store it in the destination variable.</p>
<h3 id="stack-memory-instructions">Stack memory instructions</h3><p>LLVM IR gives us <code>alloca</code> to allocate memory on the stack,
<code>store</code> to store memory at a stack address, and
<code>load</code> to read the value at a stack address into a
variable. Here's a simple example:</p>
<div class="highlight"><pre><span></span><span class="nv">%myvar</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">42</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="nv">%stackaddress</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">alloca</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
<span class="k">store</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%myvar</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="p">*</span><span class="w"> </span><span class="nv">%stackaddress</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
<span class="nv">%newvar</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">load</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="p">*</span><span class="w"> </span><span class="nv">%stackaddress</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span>
</pre></div>
<p>Such that <code>newvar</code> is now 42.</p>
<h3 id="compileif-again">compileIf again</h3><p>Applying this back to our <code>compileIf</code> helper gives us:</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">elseBlock</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">testVariable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'ifresult'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Space for result</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb"> = alloca i32, align 4`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile expression and branch</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">testVariable</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'iftrue'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'iffalse'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`br i1 %</span><span class="si">${</span><span class="nx">testVariable</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">trueLabel</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">falseLabel</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile true section</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp1</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`store i32 %</span><span class="si">${</span><span class="nx">tmp1</span><span class="si">}</span><span class="sb">, i32* %</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">'ifend'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'br label %'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile false section</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">elseBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp2</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`store i32 %</span><span class="si">${</span><span class="nx">tmp2</span><span class="si">}</span><span class="sb">, i32* %</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'br label %'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Compile cleanup</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = load i32, i32* %</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">...</span>
</pre></div>
<h3 id="trying-it-out">Trying it out</h3><p>We run our compiler one more time:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp
$<span class="w"> </span>./build/prog
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">21</span>
</pre></div>
<p>And get what we expect!</p>
<h3 id="next-up">Next up</h3><ul>
<li>Tail call optimization</li>
<li>Lists and dynamic memory</li>
<li>Strings?</li>
<li>Foreign function calls?</li>
<li>Self-hosting?</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post in the compiler basics series: using LLVM conditionals in compiling a fibonacci program <a href="https://t.co/A72yEDQ8sd">https://t.co/A72yEDQ8sd</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1125072731408666624?ref_src=twsrc%5Etfw">May 5, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/compiler-basics-llvm-conditionals.htmlSat, 04 May 2019 00:00:00 +0000
- Responsibility and ownershiphttp://notes.eatonphil.com/responsibility-and-ownership.html<p>Responsibility is only possible by granting ownership and setting
expectations. If you don't turn over ownership, don't expect folks to
take responsibility. When you grant ownership and set expectations,
you'll be astounded what folks will accomplish without you.</p>
<p>I am astounded.</p>
http://notes.eatonphil.com/responsibility-and-ownership.htmlTue, 30 Apr 2019 00:00:00 +0000
- Interpreting TypeScripthttp://notes.eatonphil.com/interpreting-typescript.html<p>In addition to providing a static type system and compiler for a
superset of JavaScript, TypeScript makes much of its functionality
available programmatically. In this post we'll use the TypeScript
compiler API to build an interpreter. We'll build off of a <a href="https://github.com/Microsoft/TypeScript/wiki/Using-the-Compiler-API">TypeScript
wiki
article</a>
and cover a few areas that were confusing to me as I built out
<a href="https://github.com/eatonphil/jsc">a separate project</a>.</p>
<p>The end result we're building will look like this:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.ts<span class="w"> </span><span class="c1"># A program we can interpret</span>
print<span class="o">(</span><span class="m">1</span><span class="w"> </span>+<span class="w"> </span><span class="m">5</span><span class="o">)</span><span class="p">;</span>
$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts<span class="w"> </span><span class="c1"># Build the source code for the interpreter</span>
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts<span class="w"> </span><span class="c1"># Run the interpreter against test program</span>
<span class="m">6</span>
</pre></div>
<p><a href="https://github.com/eatonphil/jsi">All code is available on Github.</a></p>
<h3 id="setup">Setup</h3><p>To begin with, we need Node.js and some dependencies:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>yarn<span class="w"> </span>add<span class="w"> </span>typescript<span class="w"> </span>@types/node
</pre></div>
<p>Then we can begin the first stage of an interpreter: parsing the code.</p>
<h3 id="parsing">Parsing</h3><p>Parsing a fixed set of files is simple enough. We pass a list of files
to <code>createProgram</code> along with compiler options. But, as a user, we
don't want to keep track of all files used by a program
(i.e. everything we import). The most ideal situation is to pass a
single-file entrypoint (something like a main.js) and have our
interpreter figure out all the imports and handle them
recursively. More on this later, for now we'll just parse the
single-file entrypoint.</p>
<div class="highlight"><pre><span></span><span class="k">import</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">'typescript'</span><span class="p">;</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">TS_COMPILER_OPTIONS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">allowNonTsExtensions</span><span class="o">:</span><span class="w"> </span><span class="kt">true</span><span class="p">,</span>
<span class="p">};</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">fileName</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="o">:</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Program</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">createProgram</span><span class="p">([</span><span class="nx">fileName</span><span class="p">],</span><span class="w"> </span><span class="nx">TS_COMPILER_OPTIONS</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">program</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// TODO }</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">entrypoint</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">entrypoint</span><span class="p">);</span>
<span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">program</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mf">2</span><span class="p">]);</span>
</pre></div>
<h3 id="interpret-and-ts.program">interpret and ts.Program</h3><p>A program contains all source files as well as any implicitly needed
TypeScript definition files (for us it will just be the TypeScript
definitions for the Node.js standard library).</p>
<p class="note">
The program also gives us access to a type checker that we can use
to query the type of any node in the program tree. We'll get into
this in another post.
</p><p>Our interpret program will iterate over all the source files, ignoring
the TypeScript definition files, and call interpretNode on all the
elements of the source file.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// TODO }</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">program</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">getSourceFiles</span><span class="p">().</span><span class="nx">map</span><span class="p">((</span><span class="nx">source</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">fileName</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">source</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">fileName</span><span class="p">.</span><span class="nx">endsWith</span><span class="p">(</span><span class="s1">'.d.ts'</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">forEachChild</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="p">));</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">results</span><span class="p">;</span>
<span class="w"> </span><span class="p">});</span>
<span class="p">}</span>
</pre></div>
<h3 id="interpretnode-and-ts.node">interpretNode and ts.Node</h3><p>A Node is a wrapper for most elements of what we consider a program to
be, such as a binary expression (<code>2 + 3</code>), a literal
expression (<code>2</code>), a function call expression
(<code>a(c)</code>), and so forth. When exploring a parser, it takes
time to become familiar with the particular way that a parser breaks
out a program into a tree of nodes.</p>
<p>As a concrete example, the following program:</p>
<div class="highlight"><pre><span></span><span class="nx">print</span><span class="p">(</span><span class="nx">a</span><span class="p">);</span>
</pre></div>
<p>Will be built into ts.Node tree along these lines:</p>
<div class="highlight"><pre><span></span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionStatement</span><span class="o">:</span><span class="w"> </span><span class="n">print</span><span class="o">(</span><span class="n">a</span><span class="o">);</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">CallExpression</span><span class="o">:</span><span class="w"> </span><span class="n">print</span><span class="o">,</span><span class="w"> </span><span class="n">a</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">Identifier</span><span class="o">:</span><span class="w"> </span><span class="n">print</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">Identifier</span><span class="o">:</span><span class="w"> </span><span class="n">a</span>
<span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">EndOfFileToken</span>
</pre></div>
<p>And another example:</p>
<div class="highlight"><pre><span></span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span>
</pre></div>
<p>Will be built into a ts.Node tree along these lines:</p>
<div class="highlight"><pre><span></span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">Expression</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">3</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">BinaryExpression</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span><span class="o">,</span><span class="w"> </span><span class="mi">3</span><span class="o">,</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">NumericLiteral</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">NumericLiteral</span><span class="o">:</span><span class="w"> </span><span class="mi">3</span>
<span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">PlusToken</span>
<span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">EndOfFileToken</span>
</pre></div>
<p>But how would one come to know this?</p>
<h4 id="exploring-the-ts.node-tree">Exploring the ts.Node tree</h4><p>The easiest thing to do is throw an error on every Node type we don't
yet know about and fill in support for each program we throw at the
interpreter.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">default</span><span class="o">:</span>
<span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Unsupported node type: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Now let's run our interpreter against an input file,
<code>test.ts</code>, that combines these two to make a
semi-interesting program:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.ts
print<span class="o">(</span><span class="m">1</span><span class="w"> </span>+<span class="w"> </span><span class="m">2</span><span class="o">)</span><span class="p">;</span>
$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>ExpressionStatement
...
</pre></div>
<p>And we see an outer wrapper, an ExpressionStatement. To proceed we
look up the definition of an ExpressionStatement in TypeScript source
code,
<a href="https://github.com/Microsoft/TypeScript/blob/master/src/compiler/types.ts">src/compiler/types.ts</a>
to be specific. This file will become our best friend. Hit ctrl-f and
look for "interface ExpressionStatement ". We see that it has only one
child, <code>expression</code>, so we call <code>interpretNode</code>
on this recursively:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">ExpressionStatement</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">es</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">ExpressionStatement</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">es</span><span class="p">.</span><span class="nx">expression</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">default</span><span class="o">:</span>
<span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Unsupported node type: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Thankfully TypeScript will be very quick to call us out if we
misunderstand this structure.</p>
<p class="note">
It's pretty weird to me that the ts.Node tree is organized such that
I must cast at each ts.Node but that's what they do even in the
TypeScript source so I don't think I'm misunderstanding.
</p><p>Now we recompile and run the interpreter against the program to
discover the next ts.Node type.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>CallExpression
...
</pre></div>
<p>Cool! Back to
<a href="https://github.com/Microsoft/TypeScript/blob/master/src/compiler/types.ts">src/compiler/types.ts</a>.
Call expressions are complex enough that we'll break out handling them
into a separate function.</p>
<h3 id="interpretcall-and-ts.callexpression">interpretCall and ts.CallExpression</h3><p>From our reading of types.ts we need to handle the expression that
evaluates to a function, and we need to handle its parameters. We'll
just call <code>interpretNode</code> on each of these to get their
real value. And finally we'll call the function with the arguments.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretCall</span><span class="p">(</span><span class="nx">ce</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.CallExpression</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">ce</span><span class="p">.</span><span class="nx">arguments</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">interpretNode</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fn</span><span class="p">(...</span><span class="nx">args</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">CallExpression</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ce</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">CallExpression</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">interpretCall</span><span class="p">(</span><span class="nx">ce</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p class="node">
Please ignore the fact that we are not correctly setting
<code>this</code> here.
</p><p>Recompile and let's see what's next!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>Identifier
...
</pre></div>
<p>And back to types.ts.</p>
<h3 id="ts.identifier">ts.Identifier</h3><p>In order to support identifiers in general we'd need to have a context
we could use to look up the value of an identifier. But we don't have
a context like this right now so we'll add builtin support for a
<code>print</code> function so we can get some output!</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">Identifier</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">).</span><span class="nx">escapedText</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'print'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(...</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(...</span><span class="nx">args</span><span class="p">);</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Unsupported identifier: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">id</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Recompile and let's see what's next!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>BinaryExpression
...
</pre></div>
<p>And we're finally into the parameters.</p>
<h3 id="interpretbinaryexpression-and-ts.binaryexpression">interpretBinaryExpression and ts.BinaryExpression</h3><p>Looking into types.ts for this Node type suggests we may want to break
this out into its own function as well; there are a ton of operator
types. Within the <code>interpretBinaryExpression</code> helper we'll
interpret each operand and then switch on the operator type. We'll
throw an error on operators we don't know about -- all of them at
first:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretBinaryExpression</span><span class="p">(</span><span class="nx">be</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.BinaryExpression</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">left</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">default</span><span class="o">:</span>
<span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Unsupported operator: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">BinaryExpression</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">BinaryExpression</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">interpretBinaryExpression</span><span class="p">(</span><span class="nx">be</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>We know the drill.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>FirstLiteralToken
...
</pre></div>
<p>At this point we're actually failing first on an unknown <strong>node type</strong>
rather than an operator. This is because we interpret the operands
(which are numeric literals) before we look up the operator. Time to
revisit types.ts!</p>
<h3 id="ts.firstliteraltoken,-ts.numericliteral">ts.FirstLiteralToken, ts.NumericLiteral</h3><p>Looking at types.ts shows us that <code>FirstLiteralToken</code> is a
synonym for <code>NumericLiteral</code>. The latter name is more
obvious, so let's add that to our supported Node list:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">NumericLiteral</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">nl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">NumericLiteral</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">nl</span><span class="p">.</span><span class="nx">text</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And we keep going!</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>operator:<span class="w"> </span>PlusToken
...
</pre></div>
<p>And we're into unknown operator territory!</p>
<h3 id="interpretbinaryexpression-and-ts.plustoken">interpretBinaryExpression and ts.PlusToken</h3><p>A simple extension to our existing
<code>interpretBinaryExpression</code>, we return the sum of the left
and right values:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretBinaryExpression</span><span class="p">(</span><span class="nx">be</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.BinaryExpression</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">left</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts.SyntaxKind.PlusToken</span><span class="o">:</span>
<span class="w"> </span><span class="kt">return</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">right</span><span class="p">;</span>
<span class="w"> </span><span class="nx">default</span><span class="o">:</span>
<span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Unsupported operator: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And we give it another shot.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
...
Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>EndOfFileToken
...
</pre></div>
<h3 id="ts.syntaxkind.endoffiletoken">ts.SyntaxKind.EndOfFileToken</h3><p>Our final Node type before a working program, we simply do nothing:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts.SyntaxKind.EndOfFileToken</span><span class="o">:</span>
<span class="w"> </span><span class="kt">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>One more time:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
<span class="m">3</span>
</pre></div>
<p>A working program! And if we jiggle the test?</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.ts
print<span class="o">(</span><span class="m">1</span><span class="w"> </span>+<span class="w"> </span><span class="m">5</span><span class="o">)</span><span class="p">;</span>
$<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts
<span class="m">6</span>
</pre></div>
<p>We're well on our way to interpreting TypeScript, and have gained some
familiarity with the TypeScript Compiler API.</p>
<p><a href="https://github.com/eatonphil/jsi">All code is available on Github.</a></p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Quick intro to the TypeScript Compiler API by writing an interpreter <a href="https://t.co/QKz3XtOuP9">https://t.co/QKz3XtOuP9</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1117461518801604613?ref_src=twsrc%5Etfw">April 14, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/interpreting-typescript.htmlSun, 14 Apr 2019 00:00:00 +0000
- Writing a web server from scratch: 1. HTTP and socketshttp://notes.eatonphil.com/web-server-basics-http-and-sockets.html<p>Say we have some HTML:</p>
<div class="highlight"><pre><span></span><span class="p"><</span><span class="nt">html</span><span class="p">></span>
<span class="p"><</span><span class="nt">body</span><span class="p">></span>
<span class="p"><</span><span class="nt">h1</span><span class="p">></span>Hello world!<span class="p"></</span><span class="nt">h1</span><span class="p">></span>
<span class="p"></</span><span class="nt">body</span><span class="p">></span>
<span class="p"></</span><span class="nt">html</span><span class="p">></span>
</pre></div>
<p>And say we'd like to be able to render this page in a web browser. If
the server is hosted locally we may want to enter
<code>localhost:9000/hello-world.html</code> in the address bar, hit
enter, make a request (done by the browser), receive a response (sent
by some server), and render the result (done by the browser).</p>
<p>Here is a minimal, often incomplete, and unsafe Node.js program (about
100 LoC) that would serve this (<a href="https://github.com/eatonphil/uweb">code available on
Github</a>):</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">net</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'net'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'\r\n'</span><span class="p">;</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">HELLO_WORLD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`<html></span>
<span class="sb"> <body></span>
<span class="sb"> <h1>Hello world!</h1></span>
<span class="sb"> </body></span>
<span class="sb"></html>`</span><span class="p">;</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">NOT_FOUND</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`<html></span>
<span class="sb"> <body></span>
<span class="sb"> <h1>Not found</h1></span>
<span class="sb"> </body></span>
<span class="sb"></html>`</span><span class="p">;</span>
<span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">connection</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">statusLine</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="nx">headers</span><span class="o">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">buffer</span><span class="p">.</span><span class="nx">toString</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Parse/store status line if necessary</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Parse/store headers if the body hasn't begun</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">();</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Reached the end of headers, double CRLF</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">key</span><span class="p">.</span><span class="nx">toLowerCase</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">].</span><span class="nx">push</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">trimStart</span><span class="p">());</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">requestComplete</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">!</span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">contentLength</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="s1">'content-length'</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">.</span><span class="nx">method</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">'GET'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="nx">contentLength</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">sendResponse</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">status</span><span class="o">:</span><span class="w"> </span><span class="mf">200</span><span class="p">,</span><span class="w"> </span><span class="nx">statusMessage</span><span class="o">:</span><span class="w"> </span><span class="s1">'OK'</span><span class="p">,</span><span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="s1">''</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">.</span><span class="nx">path</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'/hello-world.html'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">HELLO_WORLD</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">404</span><span class="p">;</span>
<span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">statusMessage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'NOT FOUND'</span><span class="p">;</span>
<span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">NOT_FOUND</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">serialized</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'HTTP/1.1 ${response.status} ${response.statusMessage}'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="s1">'Content-Length: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">serialized</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">requestComplete</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">sendResponse</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Other-wise the connection may attempt to be re-used, we don't support this.</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">end</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="p">(</span><span class="nx">connection</span><span class="p">);</span>
<span class="w"> </span><span class="nx">connection</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">handler</span><span class="p">.</span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">));</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">createServer</span><span class="p">(</span><span class="nx">handleConnection</span><span class="p">);</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="s1">'9000'</span><span class="p">);</span>
</pre></div>
<p>So what's going on?</p>
<h3 id="the-protocol">The protocol</h3><p>HTTP (version 1.1, specifically) is a convention for connecting over
TCP/IP and sending plain-text messages between two processes. HTTP
messages are broken into two categories: requests (the sender of a
request is called a "client") and responses (the sender of a response
is called a "server").</p>
<p>HTTP is important because it is the default protocol of web
browsers. When we type in <code>localhost:9000/hello-world.html</code>
and hit enter, the browser will open an TCP/IP connection to the
location <code>localhost</code> on the port <code>9000</code> and send
an HTTP request. If/when it receives the HTTP response from the server
it will try to render the response.</p>
<h4 id="an-http-request">An HTTP request</h4><p>A bare minimum HTTP/1.1 request (<a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html">defined
here</a>) based
on the request for <code>localhost:9000/hello-world.html</code> is the
following:</p>
<div class="highlight"><pre><span></span>GET /hello-world.html HTTP/1.1\r\nHost: localhost:9000\r\n\r\n
</pre></div>
<p class="note">
The spec explicitly requires the <code>\r\n</code> combo to
represent a newline instead of simply <code>\n</code>.
</p><p>If we printed out this request it would look like this:</p>
<div class="highlight"><pre><span></span>GET /hello-world.html HTTP/1.1
Host: localhost:9000
</pre></div>
<h4 id="components-of-an-http-request">Components of an HTTP request</h4><p>An HTTP/1.1 request is made up of a few parts:</p>
<ul>
<li>[Mandatory]: The status line (the first line) followed by a CRLF (the <code>\r\n</code> combo)</li>
<li>[Mandatory]: HTTP headers separated by a CRLF and followed by an additional CRLF</li>
<li>[Optional]: The request body</li>
</ul>
<p>The status line consists of the request method (e.g. GET, POST, PUT,
etc.), the path for the request, and the protocol -- all separated by
a space.</p>
<p>An HTTP header is a key-value pair separated by a colon. Spaces
following the colon are ignored. The key is case insensitive. Only
the <code>Host</code> header appears to be mandatory. Since these
headers are sent by the client they are intended for the server's use.</p>
<p>The request body is text and is only relevant for requests of certain
methods (e.g. POST but not GET).</p>
<h4 id="an-http-response">An HTTP response</h4><p>A bare minimum HTTP/1.1 response (<a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html">defined
here</a>) based
on the file we wanted to send back is the following:</p>
<div class="highlight"><pre><span></span>HTTP/1.1 200 OK\r\n\r\n<html>\n <body>\n <h1>Hello world!</h1>\n </body>\n</html>
</pre></div>
<p>If we printed out this response it would look like this:</p>
<div class="highlight"><pre><span></span>HTTP/1.1 200 OK
<html>
<body>
<h1>Hello world!</h1>
</body>
</html>
</pre></div>
<h4 id="components-of-an-http-response">Components of an HTTP response</h4><p>An HTTP/1.1 response is made up of a few parts:</p>
<ul>
<li>[Mandatory]: The status line (the first line) followed by a CRLF</li>
<li>[Optional]: HTTP headers separated by a CRLF and followed by an additional CRLF</li>
<li>[Optional]: The request body</li>
</ul>
<p>The status line consists of the protocol, the status code, and the
status message -- all separated by a space.</p>
<p>HTTP response headers are the same as HTTP request headers although in
a response they are directives from the server to the client. There
are many <a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html">standard
headers</a> that
are used for such things as setting cache rules, setting cookies,
settings response type (e.g. HTML vs CSS vs PNG so the browser knows
how to handle the response).</p>
<p>The response body is similar to the HTTP request body.</p>
<h3 id="sockets">Sockets</h3><p>Most operating systems have a built-in means of connecting over TCP/IP
(and sending and receiving messages) called "sockets". Sockets allow
us to treat TCP/IP connections like files in memory. Most programming
languages have a built-in socket library. Node.js provides a
high-level interface for listening on a port and handling new
connections.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">connection</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">doSomething</span><span class="o">???</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">createServer</span><span class="p">(</span><span class="nx">handleConnection</span><span class="p">);</span>
<span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="s1">'9000'</span><span class="p">);</span>
</pre></div>
<p>Once the program is listening, clients can open TCP/IP connections to
the address (<code>localhost</code>) and port (<code>9000</code>) and
our program takes over from there. Each connection is handled
separately and receives "data" events. Each data event includes new
bytes available for us to handle.</p>
<p>Let's encapsulate the state of each connection in HTTPRequestHandler
class. Its function will be to parse data as it becomes available and
respond to the request when the request is done.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">connection</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">requestComplete</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">sendResponse</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">requestComplete</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">sendResponse</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Other-wise the connection may attempt to be re-used, we don't support this.</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">end</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="p">(</span><span class="nx">connection</span><span class="p">);</span>
<span class="w"> </span><span class="nx">connection</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">handler</span><span class="p">.</span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">...</span>
</pre></div>
<p>There are three functions we need to implement
now: <code>parse(buffer)</code>, <code>requestComplete()</code>,
and <code>sendResponse</code>.</p>
<h4 id="parse(buffer)">parse(buffer)</h4><p>This function will be responsible for progressively pulling out data
from the buffer. If the status line has not been received, it will try
to grab the status line. If the body has not yet started, it will
accumulate headers. Then it will continue accumulating the body until
we close the connection (this will happen implicitly when
<code>requestComplete()</code> returns true).</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">connection</span><span class="p">;</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">statusLine</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="nx">headers</span><span class="o">:</span><span class="w"> </span><span class="p">{},</span>
<span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">buffer</span><span class="p">.</span><span class="nx">toString</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Parse/store status line if necessary</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="s1">' '</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Parse/store headers if the body hasn't begun</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">();</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Reached the end of headers, double CRLF</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">':'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">key</span><span class="p">.</span><span class="nx">toLowerCase</span><span class="p">();</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">].</span><span class="nx">push</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">trimStart</span><span class="p">());</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h4 id="requestcomplete()">requestComplete()</h4><p>This function will look at the internal request state and return false
if the status line has not been received, no headers have been
received (although this is stricter than the HTTP/1.1 standard
requires), or if the body length is not equal to the value of the
<code>Content-Length</code> header.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span>
<span class="p">...</span>
<span class="w"> </span><span class="nx">requestComplete</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">!</span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">contentLength</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="s1">'content-length'</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">.</span><span class="nx">method</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">'GET'</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="nx">contentLength</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h4 id="sendresponse()">sendResponse()</h4><p>Finally we'll hard-code two responses (one for the valid request for
/hello-world.html and a catch-all 404 response for every other
request). These responses need to be serialized according the HTTP
response format described above and written to the connection.</p>
<div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="n">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span>
<span class="o">...</span>
<span class="w"> </span><span class="n">sendResponse</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">status</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="n">statusMessage</span><span class="p">:</span><span class="w"> </span><span class="s1">'OK'</span><span class="p">,</span><span class="w"> </span><span class="n">body</span><span class="p">:</span><span class="w"> </span><span class="s1">''</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">statusLine</span><span class="o">.</span><span class="n">path</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'/hello-world.html'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HELLO_WORLD</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">404</span><span class="p">;</span>
<span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">statusMessage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'NOT FOUND'</span><span class="p">;</span>
<span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NOT_FOUND</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">serialized</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'HTTP/1.1 ${response.status} ${response.statusMessage}'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">CRLF</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="s1">'Content-Length: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">CRLF</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">CRLF</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="p">;</span>
<span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">connection</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">serialized</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span>
<span class="o">...</span>
<span class="p">}</span>
</pre></div>
<h3 id="run-it">Run it</h3><p>Now that we've got all the pieces we can finally run the initial program:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>uweb.js<span class="w"> </span><span class="p">&</span>
$<span class="w"> </span>open<span class="w"> </span>localhost:9000/hello-world.html
</pre></div>
<p>And we see the page! Try any other path and we receive a 404.</p>
<h3 id="review-and-next-steps">Review and next steps</h3><p>We covered the basics of HTTP/1.1: a very simple, plain-text protocol
oriented around requests and responses over a TCP/IP connection. We
realize we need to know little about anything but parsing and
formatting text on top of the TCP/IP blackbox called sockets. We
created a simple application that returns different responses based on
the request. But we're a far shot from a more general library, a web
framework. Future posts will explore this transition as well as
performance and more features.</p>
<p><a href="https://github.com/eatonphil/uweb">Code is available on Github</a>.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">First post in a new series on web server basics starting with HTTP and sockets (using JavaScript/Node.js). <a href="https://t.co/uBiNfOBJeZ">https://t.co/uBiNfOBJeZ</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1114988522702823424?ref_src=twsrc%5Etfw">April 7, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/web-server-basics-http-and-sockets.htmlSat, 06 Apr 2019 00:00:00 +0000
- Writing a simple JSON path parserhttp://notes.eatonphil.com/writing-a-simple-json-path-parser.html<p>Let's say we want to implement a simple list filtering language so
we can enter <code>a.b = 12</code> and return only results in a
list where the <code>a</code> column is an object that contains a
field <code>b</code> that is set to the value 12. What would a
<code>filter(jsonPath, equals, listOfObjects)</code> function look
like?</p>
<p>If we only needed to support object lookup, we might
implement <code>filter</code> by splitting the <code>jsonPath</code>
on periods and look at each object in the <code>listOfObjects</code>
for matching values. It might look something like this:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">,</span><span class="w"> </span><span class="nx">equals</span><span class="p">,</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'.'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">filterSingle</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">object</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">;</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="o">++</span><span class="nx">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">equals</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nx">filterSingle</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">require</span><span class="p">(</span><span class="s1">'assert'</span><span class="p">).</span><span class="nx">deepEqual</span><span class="p">(</span>
<span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="s1">'foo.bar'</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="mf">12</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">}]),</span>
<span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="mf">12</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}],</span>
<span class="p">);</span>
</pre></div>
<p>That doesn't work too badly. We haven't handled edge cases like a
<code>jsonPath</code> of <code>foo..bar</code> or
<code>bar.</code>. But those would not be difficult to handle:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">,</span><span class="w"> </span><span class="nx">equals</span><span class="p">,</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'.'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot begin with a dot, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'.'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot end with a dot, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">'.'</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">parts</span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">hasEmptyPart</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">hasEmptyPart</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">part</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot contain an empty section, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">filterSingle</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">object</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">;</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="o">++</span><span class="nx">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">equals</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nx">filterSingle</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And we now handle the most obvious invalid path cases.</p>
<h3 id="arrays?">Arrays?</h3><p>If we want to support array path syntax, things get harder. For
example:</p>
<div class="highlight"><pre><span></span><span class="nx">require</span><span class="p">(</span><span class="s1">'assert'</span><span class="p">).</span><span class="nx">deepEqual</span><span class="p">(</span>
<span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="s1">'foo.bar[0].biz'</span><span class="p">,</span><span class="w"> </span><span class="mf">14</span><span class="p">,</span><span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">14</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">19</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}]),</span>
<span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">14</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">19</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}],</span>
<span class="p">);</span>
</pre></div>
<p>We could try to stick with the hammer that is
<code>String.prototype.split</code> and write some really messy
code. :) Or we could switch to an approach that gives us more
control. Let's do that.</p>
<p>We'll build a very simple lexer that will iterate over each character
accumulating characters into individual tokens that represent the
pieces of the path. Let's start by supporting the original
<code>jsonPath</code> syntax and error-handling.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'.'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">currentToken</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot contain empty section, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">currentToken</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot end with dot, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">parts</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">,</span><span class="w"> </span><span class="nx">equals</span><span class="p">,</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">);</span>
<span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">filterSingle</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">object</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">;</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="o">++</span><span class="nx">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">equals</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nx">filterSingle</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Not too bad!</p>
<h3 id="arrays?">Arrays?</h3><p>Right. Let's build on <code>getJsonPathParts</code> to support array
syntax. Along with that we're going to impose some restrictions. The
object path parts must be only alphanumeric characters plus dashes and
underscores. The array index must only be numeric characters. Anything
else should throw an error.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">inArray</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'.'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot contain empty section, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'['</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">inArray</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path contains unexpected left bracket, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot contain empty section, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="nx">inArray</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">']'</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">inArray</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path contains unexpected right bracket, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path array index must not be empty, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Array indices are recorded as numbers, not strings.</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">parseInt</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">,</span><span class="w"> </span><span class="mf">10</span><span class="p">);</span>
<span class="w"> </span><span class="nx">inArray</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">inArray</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">code</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'0'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s1">'9'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path array index must be numeric, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="nx">code</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'A'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s1">'z'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="p">(</span><span class="nx">code</span><span class="w"> </span><span class="o">>=</span><span class="w"> </span><span class="s1">'0'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="s1">'9'</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="o">||</span>
<span class="w"> </span><span class="p">[</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">].</span><span class="nx">includes</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span>
<span class="w"> </span><span class="k">continue</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path part must contain only alphanumeric characters, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">''</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'JSON path cannot end with dot, in: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">parts</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">require</span><span class="p">(</span><span class="s1">'assert'</span><span class="p">).</span><span class="nx">deepEqual</span><span class="p">(</span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="s1">'foo.bar[0].biz'</span><span class="p">),</span><span class="w"> </span><span class="p">[</span><span class="s1">'foo'</span><span class="p">,</span><span class="w"> </span><span class="s1">'bar'</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'biz'</span><span class="p">]);</span>
</pre></div>
<p>Now we've got a simple JSON path parser with decent error handling! Of
course we wouldn't want to use this little library in production until
we had some serious test coverage. But writing tests and calling out
my mistakes will be left here as an exercise for the reader. :)</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New (short) post on parsing JSON paths in JavaScript <a href="https://t.co/mIjOMugA7C">https://t.co/mIjOMugA7C</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1111262461074784256?ref_src=twsrc%5Etfw">March 28, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/writing-a-simple-json-path-parser.htmlWed, 27 Mar 2019 00:00:00 +0000
- Writing a lisp compiler from scratch in JavaScript: 3. LLVMhttp://notes.eatonphil.com/compiler-basics-llvm.html<p class="note">
Previously in compiler basics:
<! forgive me, for I have sinned >
<br />
<a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a>
<br />
<a href="/compiler-basics-functions.html">2. user-defined functions and variables</a>
<br />
<br/>
Next in compiler basics:
<br />
<a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a>
<br />
<a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a>
<br />
<a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a>
</p><p>In this post we'll extend the
<a href="https://github.com/eatonphil/ulisp">compiler</a> to emit <a href="https://llvm.org/docs/LangRef.html">LLVM
IR</a> as an option instead of x86
assembly.</p>
<p><a href="https://github.com/eatonphil/ulisp">All source code is available on Github</a>.</p>
<p>LLVM IR is a portable, human-readable, typed, assembly-like syntax
that LLVM can apply <a href="https://llvm.org/docs/Passes.html">optimizations</a>
on before generating assembly for the target architecture. Many
language implementors choose to compile to LLVM IR specifically to
avoid needing to implement sophisticated optimizations.</p>
<p>But the biggest reason I'm adding an LLVM backend is so that I can
punt on implementing <a href="https://en.wikipedia.org/wiki/Register_allocation">register
allocation</a>. This
is the technique that allows you to generically use as many registers
as possible before storing local variables on the stack. While
register allocation algorithms are not <em>that</em> difficult, I got
bored/lazy trying to implement this for ulisp. And LLVM IR provides
"infinite" locals that get mapped as needed to registers and the stack
-- implementing register allocation.</p>
<h3 id="llvm-ir-basics">LLVM IR basics</h3><p>In LLVM IR, all local variables must be prefixed
with <code>%</code>. All global variables (including function names)
must be prefixed with <code>@</code>. LLVM IR must be in
<a href="https://www.cs.cmu.edu/~fp/courses/15411-f08/lectures/09-ssa.pdf">single-static
assignment</a>
(SSA) form, which means that no variable is assigned
twice. Additionally, literals cannot be assigned to variables
directly. So we'll work around that by adding 0 to the
literal. Furthermore, we'll take advantage of
the <code>add</code>, <code>sub</code>, and <code>mul</code>
operations built into LLVM IR.</p>
<div class="highlight"><pre><span></span><span class="c">; x = 4</span>
<span class="nv">%x</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
</pre></div>
<p>The type that the operation is operating on must be specified after
the operation name. In this case we are specifying
that <code>add</code> is operating on and returning 32-bit integers.</p>
<p>While this might seem very inefficient, we'll see in the end that
LLVM easily optimizes this away.</p>
<h4 id="function-definition">Function definition</h4><p>Functions are defined at the top-level and are much simpler than x86
assembly since the details of calling conventions are handled by LLVM.</p>
<div class="highlight"><pre><span></span><span class="c">; (def plus (a b) (+ a b))</span>
<span class="k">define</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus</span><span class="w"> </span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="err">a</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="err">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">%res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="err">a</span><span class="p">,</span><span class="w"> </span><span class="err">b</span>
<span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%res</span>
<span class="p">}</span>
</pre></div>
<p>In ulisp, all functions will return a result (and the only supported
type for now are 32-bit integers). So we annotate the definition with
this return type (<code>i32</code> in <code>define
i32</code>). Finally, we return inside the function with
the <code>ret</code> instruction that must also specify the type
(again <code>i32</code>).</p>
<h4 id="generating-llvm-ir">Generating LLVM IR</h4><p>We are going to generate LLVM IR as text. But any large project will
benefit from generating LLVM IR via
<a href="http://llvm.org/docs/ProgrammersManual.html">API</a>.</p>
<h3 id="supporting-multiple-backends">Supporting multiple backends</h3><p>The goal is to be able to switch at compile-time between generating
x86 assembly or generating LLVM IR. So we'll need to reorganize ulisp
a little bit.</p>
<p>We'll edit <code>src/ulisp.js</code> to accept a second argument to
specify the backend (and from now on we'll default to LLVM).</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'child_process'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./parser'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">backends</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./backend'</span><span class="p">);</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">args</span><span class="p">[</span><span class="mf">2</span><span class="p">]).</span><span class="nx">toString</span><span class="p">();</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">backend</span><span class="p">;</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">[</span><span class="mf">3</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'llvm'</span><span class="o">:</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kc">undefined</span><span class="o">:</span>
<span class="w"> </span><span class="nx">backend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">backends</span><span class="p">.</span><span class="nx">llvm</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'x86'</span><span class="o">:</span>
<span class="w"> </span><span class="nx">backend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">backends</span><span class="p">.</span><span class="nx">x86</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">'Unsupported backend '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">args</span><span class="p">[</span><span class="mf">3</span><span class="p">]);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">ast</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">input</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">backend</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="nx">ast</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">mkdirSync</span><span class="p">(</span><span class="s1">'build'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="nx">e</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">backend</span><span class="p">.</span><span class="nx">build</span><span class="p">(</span><span class="s1">'build'</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">);</span>
</pre></div>
<h3 id="the-llvm-backend">The LLVM backend</h3><p>We'll add <code>src/backend/llvm.js</code> and expose
<code>compile</code> and <code>build</code> functions.</p>
<h4 id="compile(ast)">compile(ast)</h4><p>This will work the same as it did for the x86 backend, creating a new
<code>Compiler</code> helper object, creating a scope manager (which
we'll get into in more detail shortly), and generating code from the
AST wrapped in a <code>begin</code>.</p>
<div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Compiler</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">scope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(),</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">getOutput</span><span class="p">();</span>
<span class="p">};</span>
</pre></div>
<h4 id="build(builddir,-output)">build(buildDir, output)</h4><p>The job of <code>build</code> will be to clean up the build directory,
write any output as needed to the directory, and compile the written
output. Since we're dealing with LLVM IR, we first call
<a href="https://llvm.org/docs/CommandGuide/llc.html">llc</a> on the IR file to
get an assembly file. Then we can call <code>gcc</code> on the
assembly to get a binary output.</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'child_process'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="p">...</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">build</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">buildDir</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'prog'</span><span class="p">;</span>
<span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="nx">buildDir</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="sb">`/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.ll`</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span>
<span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span><span class="sb">`llc -o </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.ll`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span><span class="sb">`gcc -o </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s`</span><span class="p">);</span>
<span class="p">};</span>
</pre></div>
<h3 id="taking-advantage-of-locals">Taking advantage of locals</h3><p>Before we get too far into the specifics of LLVM IR code generation,
let's build out the infrastructure to take advantage of "infinite"
locals. In particular, we want a local-manager (<code>Scope</code>)
with four functions:</p>
<ul>
<li><code>register(local: name)</code>: for tracking user variables and mapping to safe names</li>
<li><code>symbol()</code>: for tracking internal temporary variables</li>
<li><code>get(local: name)</code>: for returning the safe name of a user variable</li>
<li><code>copy()</code>: for duplicating the local-tracker when we enter a new scope</li>
</ul>
<p>It is important to track and map user variables into safe names so we
don't accidentally conflict between variable names used by the user
and names used by the compiler itself.</p>
<h4 id="register(local)">register(local)</h4><p>When we register, we'll want to replace any unsafe characters that
Lisp allows but LLVM likely won't. For now, we'll just replace any
dashes in the name (since dashes are fine in variables in Lisp) with
underscores. Then we'll add a number to the end of the local name
until we have a safe name that doesn't exist already. Finally we
return that safe name after storing the mapping.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">local</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">);</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">copy</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">n</span><span class="o">++</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">copy</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">copy</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="symbol()">symbol()</h4><p>This is a simple function that will return one new unused safe name
that we can store things in.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">symbol</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">nth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="s1">'sym'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">nth</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>We start off by making up a name based on the prefix <code>sym</code>
and a suffix of the current key length and pass that into the
<code>register</code> function to make sure we get a safe name.</p>
<h4 id="get(local)">get(local)</h4><p>This function is a very simple lookup to return the safe name for a
user variable. It is up to the caller of this function to handle if
the user variable does not exist in scope (and perhaps throw a
compiler error back to the programmer).</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">get</span><span class="p">(</span><span class="nx">local</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h4 id="copy()">copy()</h4><p>Finally, we want to expose a copy function so we can duplicate the
local storage before entering a new scope. (A variable inside a
function should not exist in scope outside the function.)</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">copy</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span>
<span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">locals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<h3 id="back-to-codegen!">Back to codegen!</h3><p>As mentioned in <code>module.exports.compile</code>, we're going to
use a <code>Compiler</code> that exposes a number of compiler helpers:</p>
<ul>
<li><code>emit(depth, code)</code>: an internal helper for outputting indented lines of code</li>
<li><code>compileBegin(ast, destination, scope)</code>: compiles a begin block</li>
<li><code>compileExpression(ast, destination, scope)</code>: compiles variable references, literals, and passes on function calls</li>
<li><code>compileCall(functionName, ast, destination, scope)</code>: compiles a function call</li>
<li><code>compileDefine([functionName, parameters, ...body], destination, scope)</code>: compiles a function definition</li>
<li><code>compileOp(op)</code>: helper function for generating code for primitive operations like <code>add</code></li>
<li><code>getOutput()</code>: returns the code generated by the compiler</li>
</ul>
<h4 id="emit(depth,-code)">emit(depth, code)</h4><p>Like we had in the x86 backend, this will indent the code two spaces
<code>depth</code> times and write it to the buffer we track generated
code.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">code</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">' '</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">indent</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<h4 id="compilebegin(ast,-destination,-scope)">compileBegin(ast, destination, scope)</h4><p>Our first compiler function actually does no code generation
itself. We'll call <code>compileExpression</code> on each item within
the begin block. And we'll pass the <code>destination</code> to the
last expression in the list so that the value of a begin block is set
to the value of its last expression. All other expressions will
receive a temporary variable to store results.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span>
<span class="w"> </span><span class="nx">expression</span><span class="p">,</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(),</span>
<span class="w"> </span><span class="nx">scope</span><span class="p">,</span>
<span class="w"> </span><span class="p">),</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>Example:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">begin</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="c1">; returns 2</span>
</pre></div>
<h4 id="compileexpression(ast,-destination,-scope)">compileExpression(ast, destination, scope)</h4><p>This is the most generic compile function. If the ast is a list
(representing a function call), it will pass compilation off to
<code>compileCall</code>. Otherwise the only non-function call parts
of the language are variable references and numeric literals.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Is a nested function call, compile it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">exp</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">exp</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// If numeric literal, store to destination register by adding 0.</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">exp</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = add i32 </span><span class="si">${</span><span class="nx">exp</span><span class="si">}</span><span class="sb">, 0`</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// If is local, store to destination register similarly.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">exp</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">res</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = add i32 %</span><span class="si">${</span><span class="nx">res</span><span class="si">}</span><span class="sb">, 0`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span>
<span class="w"> </span><span class="s1">'Attempt to reference undefined variable or unsupported literal: '</span><span class="w"> </span><span class="o">+</span>
<span class="w"> </span><span class="nx">exp</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>Example:</p>
<div class="highlight"><pre><span></span><span class="mi">1</span>
<span class="o">...</span>
<span class="nv">a</span>
<span class="o">...</span>
<span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">a</span><span class="p">)</span>
</pre></div>
<h4 id="compilecall(functionname,-arguments,-destination,-scope)">compileCall(functionName, arguments, destination, scope)</h4><p>Most function calls will automatically compile arguments before
calling the function. However, certain control-flow primitives don't
do this (e.g. <code>def</code>, <code>if</code>, etc.). Macros in Lisp
allow you to add new control-flow primitives (even if you don't use it
to modify control-flow). But we will ignore user-defined primitives
for now.</p>
<p>We'll keep a list of control-flow primitives and pass off compilation
to them if the function name matches a primitive. Otherwise, we'll
look up the function name in scope (to find its safe name), compile
the arguments, and call the function with the results of the
arguments.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">validFunction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">validFunction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span>
<span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">a</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s1">'i32 %'</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">res</span><span class="p">;</span>
<span class="w"> </span><span class="p">})</span>
<span class="w"> </span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">', '</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = call i32 @</span><span class="si">${</span><span class="nx">validFunction</span><span class="si">}</span><span class="sb">(</span><span class="si">${</span><span class="nx">safeArgs</span><span class="si">}</span><span class="sb">)`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Attempt to call undefined function: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>Yay LLVM for simplifying calls!</p>
<p>Example:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">foo</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span>
<span class="o">...</span>
<span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
</pre></div>
<h4 id="compiledefine([functionname,-parameters,-...body],-destination,-scope)">compileDefine([functionName, parameters, ...body], destination, scope)</h4><p>This is the last undefined compile function we've used. The call
signature may look funny but we write less code if we keep the
primitive signatures the same. In any case, JavaScript's destructuring
makes it pretty enough.</p>
<p>Aside from code generation, we also need to add the function itself to
scope so we can look it up later in use. Additionally we need to
create a copy of the current scope for the body of the function. And
we'll add the parameter names themselves to the child scope.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Add this function to outer scope</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren't exposed in outer scope.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeParams</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">param</span><span class="p">)</span><span class="w"> </span><span class="p">=></span>
<span class="w"> </span><span class="c1">// Store parameter mapped to associated local</span>
<span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="nx">param</span><span class="p">),</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span>
<span class="w"> </span><span class="mf">0</span><span class="p">,</span>
<span class="w"> </span><span class="sb">`define i32 @</span><span class="si">${</span><span class="nx">safeName</span><span class="si">}</span><span class="sb">(</span><span class="si">${</span><span class="nx">safeParams</span>
<span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">p</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="sb">`i32 %</span><span class="si">${</span><span class="nx">p</span><span class="si">}</span><span class="sb">`</span><span class="p">)</span>
<span class="w"> </span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">', '</span><span class="p">)</span><span class="si">}</span><span class="sb">) {`</span><span class="p">,</span>
<span class="w"> </span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`ret i32 %</span><span class="si">${</span><span class="nx">ret</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'}\n'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>Example:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">))</span>
</pre></div>
<h4 id="compileop(op)">compileOp(op)</h4><p>The last function mentioned above will help us expose some useful
primitives. This function will take a string builtin operation and
return a function that can be used to generate code when the operation
is called.</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="nx">compileOp</span><span class="p">(</span><span class="nx">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">([</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">arg1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">arg2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">arg1</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">arg2</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = </span><span class="si">${</span><span class="nx">op</span><span class="si">}</span><span class="sb"> i32 %</span><span class="si">${</span><span class="nx">arg1</span><span class="si">}</span><span class="sb">, %</span><span class="si">${</span><span class="nx">arg2</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>This allows us to add some builtin ops as primitives (even though they
aren't control-flow modifying).</p>
<div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'add'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'-'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'sub'</span><span class="p">),</span>
<span class="w"> </span><span class="s1">'*'</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">'mul'</span><span class="p">),</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">...</span>
<span class="p">}</span>
</pre></div>
<p>Example:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span>
</pre></div>
<h3 id="hello-world!">Hello world!</h3><p>Putting it all together, we'll compile this Lisp program:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="mi">2</span><span class="p">)))</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="nv">plus-two</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">(</span><span class="nv">plus-two</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">1</span><span class="p">)))</span>
</pre></div>
<p>To get 9.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/function_definition.lisp
$<span class="w"> </span>./build/prog
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">9</span>
</pre></div>
<h4 id="generated-code">Generated code</h4><p>The generated LLVM can be found in <code>./build/prog.ll</code>:</p>
<div class="highlight"><pre><span></span><span class="k">define</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus_two</span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">%sym7</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym9</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym10</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym8</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym9</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym10</span>
<span class="w"> </span><span class="nv">%sym6</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym7</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym8</span>
<span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym6</span>
<span class="p">}</span>
<span class="k">define</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nv">%sym6</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym8</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym9</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span>
<span class="w"> </span><span class="nv">%sym7</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus_two</span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym8</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym9</span><span class="p">)</span>
<span class="w"> </span><span class="nv">%sym5</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus_two</span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym6</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym7</span><span class="p">)</span>
<span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym5</span>
<span class="p">}</span>
</pre></div>
<p>You can see all these unnecessary <code>add, ... 0</code>
instructions. But let's look at the x86 assembly that LLVM generates
in <code>build/prog.s</code>:</p>
<div class="highlight"><pre><span></span><span class="nf">...</span>
<span class="nl">_plus_two:</span><span class="w"> </span><span class="c1">## @plus_two</span>
<span class="w"> </span><span class="nf">.cfi_startproc</span>
<span class="c1">## %bb.0:</span>
<span class="w"> </span><span class="c1">## kill: def $esi killed $esi def $rsi</span>
<span class="w"> </span><span class="c1">## kill: def $edi killed $edi def $rdi</span>
<span class="w"> </span><span class="nf">leal</span><span class="w"> </span><span class="mi">2</span><span class="p">(</span><span class="o">%</span><span class="nb">rdi</span><span class="p">,</span><span class="o">%</span><span class="nb">rsi</span><span class="p">),</span><span class="w"> </span><span class="o">%</span><span class="nb">eax</span>
<span class="w"> </span><span class="nf">retq</span>
<span class="w"> </span><span class="nf">.cfi_endproc</span>
<span class="w"> </span><span class="c1">## -- End function</span>
<span class="nf">...</span>
</pre></div>
<p>And we see that LLVM easily optimized the inefficiencies away. :)</p>
<h3 id="next-up">Next up</h3><ul>
<li>Compiling conditionals</li>
<li>Tail call optimization</li>
<li>Lists and dynamic memory</li>
<li>Strings?</li>
<li>Foreign function calls?</li>
<li>Self-hosting?</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Adding an LLVM backend to ulisp (small Lisp compiler in JavaScript) <a href="https://t.co/VIddKW1r3N">https://t.co/VIddKW1r3N</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1104795606365757442?ref_src=twsrc%5Etfw">March 10, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/compiler-basics-llvm.htmlSun, 10 Mar 2019 00:00:00 +0000
- AOT-compilation of Javascript with V8http://notes.eatonphil.com/aot-compilation-of-javascript-with-v8.html<p>tldr; I'm working on a AOT-compiled Javascript implementation called
<a href="https://github.com/eatonphil/jsc">jsc</a>.</p>
<p>Many dynamically typed programming languages have implementations that
compile to native binaries:</p>
<ul>
<li>Python: <a href="https://cython.org/">Cython</a></li>
<li>Common Lisp: <a href="http://www.sbcl.org/">SBCL</a></li>
<li>Scheme: <a href="https://www.call-cc.org/">Chicken Scheme</a></li>
</ul>
<p>The benefits of compiling dynamically typed languages are similar to
those of compiling statically typed languages:</p>
<ul>
<li>Simplified deployment via a single binary</li>
<li>Simplified foreign-function interfaces<ul>
<li>e.g. <a href="https://wiki.call-cc.org/An%20extended%20FFI%20example">Embedded C/C++ strings</a></li>
</ul>
</li>
<li>Predictable performance compared to JIT compiling interpreters</li>
<li>Performance gains compared to non-JIT compiling interpreters</li>
</ul>
<p>I (re)discovered a common technique for compiling dynamic languages
while developing <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a>,
an interpreter and compiler for Scheme. In this technique, you use
core parts of the runtime code as a library that is imported and
referenced by compiled code.</p>
<p>You save time building object-memory representations, memory
management, operations, interacting with existing libraries, etc. when
an interpreter already exists. The runtime as a library (plus existing
parser frontends) allows you to focus solely on code generation of
control flow.</p>
<h3 id="the-first-pass">The first pass</h3><p>I wrote the initial version of <a href="https://github.com/eatonphil/jsc">jsc</a>
in Rust using Dave Herman's
<a href="https://github.com/dherman/esprit">esprit</a> parser (supports a subset
of ES6 that includes all of ES5).</p>
<p>The interesting parts of the runtime are taken care of by V8, e.g.:</p>
<ul>
<li><code>V8::String</code> - a Javascript string object<ul>
<li><code>V8::String::NewFromUtf8(isolate, "hello world!")</code> - C++ string to Javascript string object</li>
</ul>
</li>
<li><code>V8::Number</code> - a Javascript number object<ul>
<li><code>V8::Number::New(isolate, 10)</code> - C++ double to Javascript number object</li>
</ul>
</li>
<li>Heap allocations</li>
<li>Calling convention</li>
</ul>
<p>And so on.</p>
<h4 id="an-example">An example</h4><p>This first version of jsc could take the following Javascript:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mf">50</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<p>And produce the following C++:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf"><string></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><iostream></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><node.h></span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Array</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Boolean</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Context</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Exception</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Function</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionTemplate</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionCallbackInfo</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Isolate</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Local</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Null</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Number</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Object</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">String</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">False</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">True</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Value</span><span class="p">;</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">fib_0</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">>&</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">n_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">a_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">b_3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="nl">tail_recurse_4</span><span class="p">:</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Context</span><span class="o">></span><span class="w"> </span><span class="n">ctx_5</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">global_6</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ctx_5</span><span class="o">-></span><span class="n">Global</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">Boolean_7</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">global_6</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"Boolean"</span><span class="p">)));</span>
<span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_8</span><span class="p">(</span><span class="n">n_1</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_9</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_8</span><span class="p">);</span>
<span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_10</span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">));</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_11</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_10</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_12</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">(</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsBoolean</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-></span><span class="n">IsBoolean</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-></span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">string_tmp_9</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">string_tmp_11</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">))))</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_13</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Boolean_7</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_12</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">result_13</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">()</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// return a;</span>
<span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">a_2</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Context</span><span class="o">></span><span class="w"> </span><span class="n">ctx_14</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">global_15</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ctx_14</span><span class="o">-></span><span class="n">Global</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">Boolean_16</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">global_15</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"Boolean"</span><span class="p">)));</span>
<span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_17</span><span class="p">(</span><span class="n">n_1</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_18</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_17</span><span class="p">);</span>
<span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_19</span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">));</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_20</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_19</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_21</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">(</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsBoolean</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsBoolean</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">string_tmp_18</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">string_tmp_20</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">))))</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_22</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Boolean_16</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_21</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">result_22</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">()</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// return b;</span>
<span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">b_3</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// return fib(n - 1, b, a + b);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_23</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">n_1</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_24</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b_3</span><span class="p">;</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_25</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">a_2</span><span class="o">-></span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">b_3</span><span class="o">-></span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">Concat</span><span class="p">(</span><span class="n">a_2</span><span class="o">-></span><span class="n">ToString</span><span class="p">(),</span><span class="w"> </span><span class="n">b_3</span><span class="o">-></span><span class="n">ToString</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">>::</span><span class="n">Cast</span><span class="p">((</span><span class="n">a_2</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">b_3</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">a_2</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b_3</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">)));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">FunctionTemplate</span><span class="o">></span><span class="w"> </span><span class="n">ftpl_27</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib_0</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_26</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_27</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">();</span>
<span class="w"> </span><span class="n">fn_26</span><span class="o">-></span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"fib_0"</span><span class="p">));</span>
<span class="w"> </span><span class="n">n_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arg_23</span><span class="p">;</span>
<span class="w"> </span><span class="n">a_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arg_24</span><span class="p">;</span>
<span class="w"> </span><span class="n">b_3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arg_25</span><span class="p">;</span>
<span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">tail_recurse_4</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">jsc_main</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">>&</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="nl">tail_recurse_5</span><span class="p">:</span>
<span class="w"> </span><span class="c1">// console.log(fib(50, 0, 1))</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">dot_parent_7</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">()</span><span class="o">-></span><span class="n">Global</span><span class="p">()</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"console"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">String</span><span class="o">></span><span class="w"> </span><span class="n">property_8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"log"</span><span class="p">);</span>
<span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">dot_parent_7</span><span class="o">-></span><span class="n">IsObject</span><span class="p">()</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="o">!</span><span class="n">dot_parent_7</span><span class="p">.</span><span class="n">As</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="p">()</span><span class="o">-></span><span class="n">HasOwnProperty</span><span class="p">(</span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">(),</span><span class="w"> </span><span class="n">property_8</span><span class="p">).</span><span class="n">ToChecked</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">dot_parent_7</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dot_parent_7</span><span class="p">.</span><span class="n">As</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="p">()</span><span class="o">-></span><span class="n">GetPrototype</span><span class="p">();</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">dot_result_6</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dot_parent_7</span><span class="p">.</span><span class="n">As</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="p">()</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">(),</span><span class="w"> </span><span class="n">property_8</span><span class="p">).</span><span class="n">ToLocalChecked</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_9</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">50</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">FunctionTemplate</span><span class="o">></span><span class="w"> </span><span class="n">ftpl_13</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib_0</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_12</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_13</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">();</span>
<span class="w"> </span><span class="n">fn_12</span><span class="o">-></span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"fib_0"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_14</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_9</span><span class="p">,</span><span class="w"> </span><span class="n">arg_10</span><span class="p">,</span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_15</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_12</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">argv_14</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result_15</span><span class="p">;</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_17</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">dot_result_6</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_18</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_19</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_17</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">dot_parent_7</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_18</span><span class="p">);</span>
<span class="w"> </span><span class="n">result_19</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">Init</span><span class="p">(</span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">exports</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">NODE_SET_METHOD</span><span class="p">(</span><span class="n">exports</span><span class="p">,</span><span class="w"> </span><span class="s">"jsc_main"</span><span class="p">,</span><span class="w"> </span><span class="n">jsc_main</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">NODE_MODULE</span><span class="p">(</span><span class="n">NODE_GYP_MODULE_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">Init</span><span class="p">)</span>
</pre></div>
<p>This output gets compiled (by jsc) as a <a href="https://nodejs.org/api/addons.html">Node
addon</a> using
<a href="https://github.com/nodejs/node-gyp">node-gyp</a>.</p>
<p>The compiled addon is loaded by a single-line Javascript file generated by jsc:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>rm<span class="w"> </span>-rf<span class="w"> </span>build
$<span class="w"> </span>jsc<span class="w"> </span>fib.js
$<span class="w"> </span>cat<span class="w"> </span>build/fib.js
require<span class="o">(</span><span class="s2">"build/Release/fib.node"</span><span class="o">)</span>.jsc_main<span class="o">()</span>
$<span class="w"> </span>node<span class="w"> </span>build/fib.js
<span class="m">12586269025</span>
</pre></div>
<h4 id="analysis">Analysis</h4><p>The code was a mess of bad formatting, unnecessary locals, inefficient
basic operations (e.g. huge, often unnecessary Boolean conversions),
and so on. The unnecessary locals was partially a by-product of
single-pass code generation. And the unnecessary conversions was
partly due to ignoring types (even types of literals that you don't
need Typescript/Flow to provide).</p>
<p>After I got this proof-of-concept working for basic examples, I wanted
to rewrite it around <a href="https://github.com/eatonphil/one-pass-code-generation-in-v8/blob/master/One-pass%20Code%20Generation%20in%20V8.pdf">destination-driven code
generation</a>,
a technique by Kent Dybvig used in V8's baseline compiler. And after a
few weeks not getting far in a refactor in Rust, I rewrote the
compiler in Typescript.</p>
<h3 id="the-second-pass">The second pass</h3><p>Written in Typescript and using the <a href="https://github.com/Microsoft/TypeScript/wiki/Using-the-Compiler-API">Typescript compiler
API</a>,
this second iteration was built to do destination-driven code
generation and leaf type propagation. Destination-driven code
generation allows a single-pass code generator to reduce redundant
reassignments. And leaf type propagation allows simple, obvious
optimizations such as just calling <code>V8::Boolean::IsTrue()</code>
on a statically-known boolean rather than calling
<code>V8::Value::Equals()</code>.</p>
<h4 id="example">Example</h4><p>Given the same fibonacci Javascript program from before, this
iteration produces the following C++:</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">"lib.cc"</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">tco_fib</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">>&</span><span class="w"> </span><span class="n">_args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">>></span><span class="w"> </span><span class="n">args</span><span class="p">(</span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">());;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="nl">tail_recurse_0</span><span class="p">:</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">></span><span class="w"> </span><span class="n">sym_rhs_4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Boolean</span><span class="o">></span><span class="w"> </span><span class="n">sym_anon_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-></span><span class="n">StrictEquals</span><span class="p">(</span><span class="n">sym_rhs_4</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">True</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_anon_2</span><span class="o">-></span><span class="n">IsTrue</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">></span><span class="w"> </span><span class="n">sym_rhs_11</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Boolean</span><span class="o">></span><span class="w"> </span><span class="n">sym_anon_9</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-></span><span class="n">StrictEquals</span><span class="p">(</span><span class="n">sym_rhs_11</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">True</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_anon_9</span><span class="o">-></span><span class="n">IsTrue</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">></span><span class="w"> </span><span class="n">sym_rhs_19</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_17</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">genericMinus</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="n">sym_rhs_19</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_21</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">genericPlus</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_arg_17</span><span class="p">;</span>
<span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_arg_21</span><span class="p">;</span>
<span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">tail_recurse_0</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">jsc_main</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">>&</span><span class="w"> </span><span class="n">_args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">>></span><span class="w"> </span><span class="n">args</span><span class="p">(</span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">());;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="nl">tail_recurse_1</span><span class="p">:</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_29</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_30</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_31</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_args_32</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">sym_arg_29</span><span class="p">,</span><span class="w"> </span><span class="n">sym_arg_30</span><span class="p">,</span><span class="w"> </span><span class="n">sym_arg_31</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">sym_fn_33</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">tco_fib</span><span class="p">)</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">();</span>
<span class="w"> </span><span class="n">sym_fn_33</span><span class="o">-></span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"tco_fib"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_arg_28</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_fn_33</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">sym_fn_33</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">sym_args_32</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_args_34</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">sym_arg_28</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_parent_37</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">()</span><span class="o">-></span><span class="n">Global</span><span class="p">()</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"console"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_anon_36</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_parent_37</span><span class="p">.</span><span class="n">As</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="p">()</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"log"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">sym_fn_35</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">sym_anon_36</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">sym_anon_27</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_fn_35</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">sym_fn_35</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">sym_args_34</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">Init</span><span class="p">(</span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">exports</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">NODE_SET_METHOD</span><span class="p">(</span><span class="n">exports</span><span class="p">,</span><span class="w"> </span><span class="s">"jsc_main"</span><span class="p">,</span><span class="w"> </span><span class="n">jsc_main</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">NODE_MODULE</span><span class="p">(</span><span class="n">NODE_GYP_MODULE_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">Init</span><span class="p">)</span>
</pre></div>
<h4 id="analysis">Analysis</h4><p>Common code (<code>genericPlus</code>, <code>genericMinus</code>) and
all imports have been pulled into <code>lib.cc</code> for clarity. And
the entire result is run through
<a href="https://clang.llvm.org/docs/ClangFormat.html">clang-format</a> if it is
present on the system.</p>
<p>The benefit of leaf type propagation can be seen everywhere a local is
declared that is not <code>Local<Value></code> and specifically in if
tests on statically known booleans:</p>
<div class="highlight"><pre><span></span><span class="p">...</span>
<span class="n">Local</span><span class="o"><</span><span class="n">Boolean</span><span class="o">></span><span class="w"> </span><span class="n">sym_anon_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-></span><span class="n">StrictEquals</span><span class="p">(</span><span class="n">sym_rhs_4</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">True</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_anon_2</span><span class="o">-></span><span class="n">IsTrue</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="p">...</span>
</pre></div>
<p>It's obvious to a human that there is another optimization you could
do here by not wrapping this check in a <code>V8::Boolean</code> at
all. The only types tracked in destinations are V8 types, not yet C++
types. But not needing to passing this through a <code>bool
toBoolean(Value v)</code> wrapper is still an improvement.</p>
<p>In general, unboxing has not really been explore. But the ultimate
goal is to use Typescript types to produce function- or block-level
unboxed versions -- perhaps using a toggle in code to specify safety à
la Common Lisp.</p>
<h3 id="next-steps">Next steps</h3><p>I broke tests and regressed on syntax support in the Typescript port,
so that's the first step. The second step is enough syntax to support
more interesting benchmarks than the fibonacci example (which has
comparative performance to Node.js/V8 but isn't saying much).</p>
<p>After that:</p>
<ul>
<li>Unboxed expressions</li>
<li>Unboxed blocks</li>
<li>Foreign-function interface</li>
<li>Self-hosting</li>
<li>Node-API compatible runtime without Node</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Companion blog post to my talk on an AOT-compiled Javascript implementation built on Typescript <a href="https://t.co/0aHVJ9UzYh">https://t.co/0aHVJ9UzYh</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1100397733867859968?ref_src=twsrc%5Etfw">February 26, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/aot-compilation-of-javascript-with-v8.htmlTue, 26 Feb 2019 00:00:00 +0000
- Transparency and communication on small teamshttp://notes.eatonphil.com/transparency-and-communication-on-small-teams.html<p>I saw a post on
<a href="https://dev.to/vcarl/symptoms-of-a-dysfunctional-team-1c0">dev.to</a>
that talks about dysfunctional teams. This is a response that focuses
specifically on how to prevent burnout from overworking. This is aimed
at senior/lead engineers and engineering/project managers -- because
everyone in a leadership role is responsible for the health of the
team and the company.</p>
<p>In an otherwise good company with hard-working, ethical employees,
overworking happens because of imperfect communication. If neither of
those premises hold, you have more serious issues and have no need for
this post.</p>
<p>The primary subjects of poor communication are:</p>
<ul>
<li>Capacity/capabilities</li>
<li>Priorities</li>
<li>Results</li>
</ul>
<p>If any member of the team (or worse, the entire team) is not honestly
reporting on their capacity and capability, this will drive them to
overwork to make up for what they couldn't accomplish on work hours.</p>
<p>If any member of the team (or worse, the entire team) is not honestly
and publicly reporting on what they understand to be the priorities,
they will end up needing to work overtime if true priorities become
apparent too late.</p>
<p>And if any member of the team (or worse, the entire team) is not
honestly and publicly reporting on what they <strong>accomplished</strong>, they
will end up needing to work overtime if discrepancies become apparent
too late.</p>
<h3 id="solution">Solution</h3><p>Put a sprint process in place and schedule <strong>at least</strong> one meeting
every sprint. Discover every political, technical, and structural
stakeholder and find a time they can attend this meeting. At this
meeting you will cover at a high level (perhaps with some demos) what
was accomplished in the sprint and what you intend to accomplish in
the next sprint.</p>
<p>If any stakeholder cannot make this meeting, find a time to sync up
with him/her separately.</p>
<p>Your sprints should not last more than two weeks because any longer is
too long to go before talking to/reviewing with your stakeholders.</p>
<p>Finally, publish a report on what you accomplished this sprint (and
also what you did not accomplish!) and what you plan to accomplish the
next sprint. For example, I send an email to the engineering
organization with two docs at the end of each sprint: 1) a review doc
listing tasks accomplished/not accomplished and 2) a list of tasks
planned for the next sprint. This gives your stakeholders (and anyone
else interested) an opportunity to review the contents of the meeting
at their leisure.</p>
<p>Doing this can be difficult and embarrassing at first. Hard-working,
ethical employees never want to be seen as not accomplishing their
share of work. But the most important thing for the mid-to-long-term
health of these employees is to get them reporting honestly.</p>
<p>This helps make it clear where these employees can legitimately
improve (i.e. receive more training) and where it's necessary to hire
more or different employees. You'll likely need to put pressure on
every team member to report honestly and to do so without fear.</p>
<p>And as a result of doing this, you've done everything you can as a
senior/lead member of a small team to push responsibility for your
team's work up to your stakeholders. This is the best position to be
in.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">tldr; don't let your folks overwork unnecessarily when you could be reporting more frequently/honestly on understood priorities and accomplishments achieved/not achieved <a href="https://t.co/PeTe2Bq0Xz">https://t.co/PeTe2Bq0Xz</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1087722536236957697?ref_src=twsrc%5Etfw">January 22, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/transparency-and-communication-on-small-teams.htmlTue, 22 Jan 2019 00:00:00 +0000
- Windowshttp://notes.eatonphil.com/windows.html<p>It has been six years since I last used Windows for any remotely
serious software development. I've used Ubuntu, Arch, or FreeBSD
since. But eventually I spent so much time working around common
workplace tasks that I decided to put Windows 10 Pro on my work
laptop.</p>
<h3 id="windows-subsystem-for-linux">Windows Subsystem for Linux</h3><p>Introduced in 2016, this technology allows Windows to run unmodified
Linux binaries. The core feat being <a href="https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-system-calls/">syscall
translation</a>.</p>
<p>It works nearly flawlessly. This means I can do all my Go, Node,
PostgreSQL development on Windows without a virtual machine using
bash, tmux, git, emacs, etc.</p>
<p>I've seen a few minor exceptions over the course of regular software
development in WSL:</p>
<ul>
<li><a href="https://github.com/Microsoft/WSL/issues/2249">ss/netstat does not work</a></li>
<li><a href="https://github.com/hashicorp/vagrant/issues/8700">vagrant does not work</a></li>
</ul>
<p>More generally, Linux programs are heavily file-oriented. And Windows
I/O <a href="https://github.com/Microsoft/WSL/issues/873#issuecomment-425272829">is not designed well for
that</a>.
In the worst cases (installing/adding Node packages) it can take
minutes to do operations that would take Linux seconds.</p>
<h3 id="vagrant">Vagrant</h3><p>Vagrant-Windows interoperability is abysmal.</p>
<p>As noted above, you cannot manage Hyper-V from vagrant within WSL. So
you're stuck using Powershell. Even then, managing synced files from
vagrant is a nightmare. The default sync method requires you to sign
in using your <strong>Windows Live</strong> username and password on every
reboot. But Node package installation attempts some file operations
that are not supported over the default synced, network filesystem.</p>
<p>When I switched to rsync vagrant wouldn't reliable sync when the
virtual machine went down and came back up.</p>
<p>After hours of trying to get some files synced with vagrant I gave up.</p>
<h3 id="hyper-v">Hyper-V</h3><p>Hyper-V's GUI is much more complex/feature-complete than VirtualBox.
It even provides a Ubuntu-quick-install that I used to jump right in.
I don't recommend using this though because it gives you no option but
an 11GB hard disk. I didn't realize this until I went through an hour
or two of post-install customization only to run out of space. Too
lazy to boot into a live CD to grow the root filesystem I reinstalled
with a more suitable 64GB drive and went through the hour-long
post-install customization process again.</p>
<p>Networking in Hyper-V is more complex/feature-complete than VirtualBox
as well. To access a Hyper-V machine you must create a new virtual
network interface manually and associate it. Static IP address appear
to be controlled at the host networking level (e.g. Control Panel)
instead of within the Hyper-V interface. This highlights how these
virtual interfaces are first-class, but overcomplicates the process of
getting started.</p>
<p>Ultimately I gave up on a static IP address and decided to reboot less
frequently.</p>
<p>Performance-wise Hyper-V machines are exactly as expected: excellent.</p>
<h3 id="misc">Misc</h3><p>Docker support on Windows needs work. It took me a while to understand
how Docker interacts with the WSL filesystem and what I needed to do
to allow Docker to mount. The complexity is similar on macOS when you
want to mount privileged directories like /var, but the experience is
worse on Windows.</p>
<p>Apparently Windows does have tiling window managers, but I have not
tried one out yet.</p>
<p>Powershell, a language with real types, is pretty compelling. But I
have not spent enough time with it to be efficient. And since WSL is
mostly good enough I don't really plan to.</p>
<p>Windows doesn't allow you to delete any files that are "in use". This
is kinda cool except for that the errors you get when trying to delete
files that are in use are useless. They are even more useless when you
get the plain "could not delete directory" when you try to delete a
directory with some file inside it that is in use. I had to start
deleting files within by hand until I found the one I realized was in
use.</p>
<h3 id="conclusion">Conclusion</h3><p>If you have never run Linux or FreeBSD, don't use this post as an
excuse not to. You should run Linux or FreeBSD for the experience. But
if you've reached diminishing returns in your Linux/FreeBSD use,
Windows as a development environment has come a long way. It may be
the best platform available for software development, the profession.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Some notes on my experience having replaced Arch Linux with Windows on my work laptop <a href="https://t.co/8asxZmspwR">https://t.co/8asxZmspwR</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1086994000182153222?ref_src=twsrc%5Etfw">January 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/windows.htmlSun, 20 Jan 2019 00:00:00 +0000
- Writing a lisp compiler from scratch in JavaScript: 2. user-defined functions and variableshttp://notes.eatonphil.com/compiler-basics-functions.html<p class="note">
Previously in compiler basics:
<! forgive me, for I have sinned >
<br />
<a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a>
<br/>
<br/>
Next in compiler basics:
<br/>
<a href="/compiler-basics-llvm.html">3. LLVM</a>
<br />
<a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a>
<br />
<a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a>
<br />
<a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a>
</p><p>In this post we'll extend the compiler to support defining functions
and variables. Additionally, we'll require the program's entrypoint to
be within a <code>main</code> function.</p>
<p>The resulting code can be found
<a href="https://github.com/eatonphil/ulisp">here</a>.</p>
<h3 id="function-definition">Function definition</h3><p>The simplest function definition we need to support is for our <code>main</code>
function. This will look like this:</p>
<div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">basic.lisp</span>
<span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>Where compiling and running it should produce a return code of 3:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>basic.lisp
$<span class="w"> </span>./build/a.out
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">3</span>
</pre></div>
<h3 id="parsing-function-definitions">Parsing function definitions</h3><p>The entire language is defined in S-expressions and we already parse
S-expressions.</p>
<div class="highlight"><pre><span></span><span class="nx">$</span><span class="w"> </span><span class="nx">node</span>
<span class="o">></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./parser'</span><span class="p">);</span>
<span class="o">></span><span class="w"> </span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">parse</span><span class="p">(</span><span class="s1">'(def main () (+ 1 2))'</span><span class="p">));</span>
<span class="s1">'[[["def","main",[],["+",1,2]]],""]'</span>
</pre></div>
<p>So we're done!</p>
<h3 id="code-generation">Code generation</h3><p>There are two tricky parts to code generation once function
definitions are introduced:</p>
<ul>
<li>Functions definitions are not expressions (in assembly)</li>
<li>Function calling conventions for the <strong>callee</strong></li>
<li>Variable scope</li>
</ul>
<h4 id="function-definitions">Function definitions</h4><p>A function definition looks like a function call. So we'll need to
keep a list of "primitive" functions that handle what looks like
function calls differently.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// TODO</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">primitive_functions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">,</span>
<span class="p">};</span>
</pre></div>
<p>Then in our <code>compile_call</code> function we need to see if the function
being "called" is in this list. If so, we allow the associated
callback to handle compilation.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Save param registers</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Compile registers and store as params</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">));</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">BUILTIN_FUNCTIONS</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Restore param registers</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Now we can begin thinking about <code>compile_define</code>. It takes <code>args</code>
which will be a list of three elements containing the function's:</p>
<ul>
<li>name</li>
<li>parameters</li>
<li>and body</li>
</ul>
<p class="note">
It does not use destination because we're treating function
definitions as statements for now and not as expressions. If we were
treating it as an expression, we might store the address of the
function in the destination register.
We keep destination around to keep the primitive function signatures
consistent.
</p><p>Based on how we called functions before and how we defined the
hard-coded <code>add</code> function, we know what a function definition in
assembly generally looks like. And we know the arguments to the
function when called will be in RDI, RSI, and RDX.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">parameters</span><span class="p">,</span><span class="w"> </span><span class="nx">body</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Function name becomes a label we can CALL</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">name</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Something to do with RDI, RSI, RDX and the parameters variable?</span>
<span class="w"> </span><span class="c1">// We renamed compile_argument to compile_expression to be more general</span>
<span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Maybe some cleanup to do with RDI, RSI, RDX?</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'RET\n'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Not a bad first sketch. But how do we match up <code>RDI</code>, <code>RSI</code>, <code>RDX</code> and
the user-defined <code>parameters</code> variable names? For example in the
following:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>It's clear to us that <code>a</code> must match up to <code>RDI</code>. In order to do this
we need to track all variables in a <code>scope</code> dictionary mapping the
variable name to the register where it's stored.</p>
<p>Additionally, keeping track of scope can help us fail quickly in the
following scenario:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>The variable <code>b</code> is used but never defined. It has not been added to
the scope dictionary. So our compiler can fail quickly saying there is
an undefined variable being referenced.</p>
<p>Taking this a step further, what if we want to catch the following
too:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nv">plus</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span>
</pre></div>
<p>We're trying to call <code>plus</code> but it has not been defined. We should be
able to fail quickly here too. But that means we're need to track the
scope of function <strong>names</strong> in addition to variables. We'll choose to
track function names and variable names in the same scope dictionary.</p>
<p class="note">
This is the distinction between a lisp-1 and a lisp-2. We are a
lisp-1 like Scheme because we have a single scope. Common Lisp is a
lisp-2 because it stores function name scope separately from
variable name scope.
</p><h3 id="implementing-scope">Implementing scope</h3><p>We need to revise every compile function to accept a scope dictionary
(specifically: <code>compile</code>, <code>compile_expression</code>, <code>compile_call</code>, and
<code>compile_define</code>). If a variable is referenced, we need to look up
it's location in the scope dictionary. If a variable is defined
(e.g. a function name or a function parameter) we need to add a
mapping to the scope dictionary.</p>
<p>Modifying <code>compile_expression</code> is easiest:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Is a nested function call, compile it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">scope</span><span class="p">[</span><span class="nx">arg</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">scope</span><span class="p">[</span><span class="nx">arg</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Attempt to reference undefined variable or unsupported literal: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">arg</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>Next we modify <code>compile_call</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Save param registers</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Compile registers and store as params</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">));</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">validFunction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">BUILTIN_FUNCTIONS</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">fun</span><span class="p">];</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">validFunction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">validFunction</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">'Attempt to call undefined function: '</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Restore param registers</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
</pre></div>
<p>And then <code>compile_define</code> where we modify scope for the first time:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Add this function to outer scope</span>
<span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren't exposed in outer scope.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="nx">scope</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="w"> </span><span class="c1">// Store parameter mapped to associated register</span>
<span class="w"> </span><span class="nx">childScope</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">register</span><span class="p">;</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span>
<span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'RET\n'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And finally we need to modify the entrypoint <code>compile</code>:</p>
<div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span>
<span class="w"> </span><span class="c1">// Pass in new, empty scope mapping</span>
<span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span>
<span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span>
<span class="p">}</span>
</pre></div>
<p>And scope-wise we're pretty good!</p>
<h3 id="function-calling-convention:-callee">Function calling convention: callee</h3><p>We currently have a problem that we're using parameters registers to
store local variables that messes up with how we are storing
parameters for function calls within the function itself.</p>
<p>Ideally we could store function local variables (including the
parameters when we get them) separately from how we store function
call parameters within the function.</p>
<p>Thankfully according to the calling convention we've followed, we're
given a set of registers that are callee-preserved. Of them we'll use
<code>RBX</code>, <code>RBP</code>, and <code>R12</code> in that order. This allows us to mess with so
long as we store them and restore them within the function.</p>
<p>Applying the same storing/restoring strategy to local variables as we
did for parameters, we get:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">LOCAL_REGISTERS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span>
<span class="w"> </span><span class="s1">'RBX'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'RBP'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'R12'</span><span class="p">,</span>
<span class="p">];</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Add this function to outer scope</span>
<span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">'-'</span><span class="p">,</span><span class="w"> </span><span class="s1">'_'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren't exposed in outer scope.</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="nx">scope</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">LOCAL_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">local</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">local</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">register</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Store parameter mapped to associated local</span>
<span class="w"> </span><span class="nx">childScope</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="p">;</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span>
<span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">);</span>
<span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Backwards first</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">LOCAL_REGISTERS</span><span class="p">[</span><span class="nx">params</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">];</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">local</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="p">});</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'RET\n'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And we're set.</p>
<h3 id="cleanup">Cleanup</h3><p>We've still got a few messes going on:</p>
<ul>
<li>emit_prefix wraps out entire body in <code>_main</code>, we're requiring our own <code>main</code> now</li>
<li>emitting to stdout instead of to a file</li>
<li>multiple function definitions is treated as nonsense</li>
</ul>
<p>Starting first, we rewrite <code>emit_prefix</code> and <code>emit_postfix</code> so that
our <code>_main</code> just calls <code>main</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.global _main\n'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.text\n'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'plus:'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'ADD RDI, RSI'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'MOV RAX, RDI'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'RET\n'</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">emit_postfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'_main:'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'CALL main'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'MOV RDI, RAX'</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set exit arg</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="s1">'exit'</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'SYSCALL'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>Next to deal with writing to a file instead of stdout, we need our
<code>emit</code> function to write to a buffer. We'll let <code>ulisp.js</code> write that
buffer to a file. Because we're incredibly lazy, we'll do this all
globally.</p>
<div class="highlight"><pre><span></span><span class="kd">let</span><span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">' '</span><span class="p">);</span>
<span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">indent</span><span class="si">}${</span><span class="nx">args</span><span class="si">}</span><span class="sb">\n`</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span>
<span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span>
<span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">OUT</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And modify <code>ulisp.js</code>:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'child_process'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'fs'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./parser'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">compile</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./compiler'</span><span class="p">);</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">args</span><span class="p">[</span><span class="mf">2</span><span class="p">]).</span><span class="nx">toString</span><span class="p">();</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">ast</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">input</span><span class="p">);</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">compile</span><span class="p">(</span><span class="nx">ast</span><span class="p">);</span>
<span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">mkdirSync</span><span class="p">(</span><span class="s1">'build'</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="nx">e</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span>
<span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="s1">'build/prog.s'</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span>
<span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span><span class="s1">'gcc -mstackrealign -masm=intel -o build/a.out build/prog.s'</span><span class="p">);</span>
<span class="p">}</span>
<span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">);</span>
</pre></div>
<p>And we're finally ready to run a simple program.</p>
<h3 id="a-program!">A program!</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp
<span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span><span class="w"> </span><span class="o">(</span>+<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="o">))</span>
$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp
$<span class="w"> </span>./build/a.out
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">3</span>
</pre></div>
<p>Hurray! Now let's try defining and calling a second function
to validate parameter behavior.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp
<span class="o">(</span>def<span class="w"> </span>plus-two<span class="w"> </span><span class="o">(</span>a<span class="o">)</span>
<span class="w"> </span><span class="o">(</span>+<span class="w"> </span>a<span class="w"> </span><span class="m">2</span><span class="o">))</span>
<span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span>
<span class="w"> </span><span class="o">(</span>plus-two<span class="w"> </span><span class="m">3</span><span class="o">))</span>
$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp
$<span class="w"> </span>./build/a.out
./compiler.js:106
<span class="w"> </span>throw<span class="w"> </span>new<span class="w"> </span>Error<span class="o">(</span><span class="s1">'Attempt to call undefined function: '</span><span class="w"> </span>+<span class="w"> </span>fun<span class="o">)</span><span class="p">;</span>
<span class="w"> </span>^
Error:<span class="w"> </span>Attempt<span class="w"> </span>to<span class="w"> </span>call<span class="w"> </span>undefined<span class="w"> </span><span class="k">function</span>:<span class="w"> </span>p2
...
</pre></div>
<p>We start getting some really weird errors. And the reason is because
our compiler doesn't know how to deal with sibling S-expressions.</p>
<p>So we'll introduce a new primitive function called <code>begin</code> that calls
all it's sibling functions and returns the value of the last
call. Then we'll wrap the program in an implicit <code>begin</code> so we don't
need to.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_begin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">));</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">primitive_functions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">,</span>
<span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="nx">compile_begin</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">...</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span>
<span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="s1">'begin'</span><span class="p">,</span><span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="s1">'RAX'</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span>
<span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">OUT</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>And we try our test program again. :)</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp
<span class="o">(</span>def<span class="w"> </span>plus-two<span class="w"> </span><span class="o">(</span>a<span class="o">)</span>
<span class="w"> </span><span class="o">(</span>+<span class="w"> </span>a<span class="w"> </span><span class="m">2</span><span class="o">))</span>
<span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span>
<span class="w"> </span><span class="o">(</span>plus-two<span class="w"> </span><span class="m">3</span><span class="o">))</span>
$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp
$<span class="w"> </span>./build/a.out
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">5</span>
</pre></div>
<p>And that's all there is to it! Stay tuned for the next post on
conditionals and tail-call optimization.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Part two on compiler basics using JavaScript: user-defined functions and variables <a href="https://t.co/XOam67HO8h">https://t.co/XOam67HO8h</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1087103061590446083?ref_src=twsrc%5Etfw">January 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/compiler-basics-functions.htmlSun, 20 Jan 2019 00:00:00 +0000
- Make small changes and solve the problems you havehttp://notes.eatonphil.com/make-small-changes-and-solve-the-problems-you-have.html<p>Two frustrating things that can happen in an organization are 1) big
changes and 2) changes that aren’t clearly associated with a known
problem. It’s even worse in that order.</p>
<p>These situations tend to happen when a problem remain unaddressed for
too long. These situations tend to happen when there is not a strong
enough emphasis on respect for all employees -- their experience,
ideas, and feelings.</p>
<p>I try to avoid these issues in teams I run by starting early with a
problem statement. Specifically when there’s a problem I’d like to
solve, I’ll mention it in our fortnightly team retro. If there’s
general agreement a problem exists, we begin looking for the least
invasive/least effort way to fix the problem. More on that later.</p>
<p>If the problem is not well understand or widely-enough shared, I’ll
table the discussion until I can talk with more people to better
articulate the problem. Or maybe there isn’t a problem after all.</p>
<p>This process of clarifying and agreeing a problem exists is the only
appropriate first step when making a change. It is important to
provide sufficient context to affected employees.</p>
<p>After the problem is understood I begin to suggest possible solutions
-- soliciting feedback and alternatives. But making sure a problem is
well understand is not the same thing as making sure that potential
solutions could reasonably solve the problem. Throughout the
discussion of solutions I try to repeatedly make sure that proposed
solutions could actually address the problem.</p>
<p>From there I try to steer discussion of solutions to ones that are
easiest to make and least invasive. Small changes are easier to
make. There is little room for disagreement when there is little
changing.</p>
<p>Making small changes among a small group of people is even easier. The
few disagreements that you find when making small changes among a
small group of people give you a chance to prove or improve the
solution before introducing it to a larger group.</p>
<p>Communicating frequently and effectively should be a clear theme here.</p>
<p>At this point if there is a single most reasonable solution, I’ll pick
it unless there is serious disagreement. Most of the time folks are
amenable to the need for a solution to be chosen to solve a problem
they agreed existed, even if they don’t love the solution.</p>
<p>If there is no clear solution or there is serious disagreement, go
back a few paragraphs and start over to understand the problem and
solicit feedback and alternative for solutions. Or take the heat of
serious disagreement.</p>
<p>This is a philosophy. It’s difficult to prove the effectiveness one
way or the other -- especially over the mid-to-long-term. But the
logic makes sense to me, it agrees with what I’ve read on management,
and has worked effectively in teams I’ve run so far.</p>
<p>Further reading:</p>
<ul>
<li><a href="https://amzn.to/2GHlro5">Peopleware: Productive Projects and Teams</a></li>
<li><a href="https://amzn.to/2BGEysM">Managing Transitions: Making the Most of Change</a></li>
<li><a href="https://amzn.to/2LA34Ar">Thinking, Fast and Slow</a></li>
<li><a href="https://amzn.to/2LDfQOz">Site Reliability Engineering: How Google Runs Production Systems</a></li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post expanding on a side of this: make small changes and solve the problems you have <a href="https://t.co/FXepELSHMx">https://t.co/FXepELSHMx</a> <a href="https://t.co/mVsT1KFhKc">https://t.co/mVsT1KFhKc</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1078312937348059136?ref_src=twsrc%5Etfw">December 27, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/make-small-changes-and-solve-the-problems-you-have.htmlThu, 27 Dec 2018 00:00:00 +0000
- Writing a lisp compiler from scratch in JavaScript: 1. lisp to assemblyhttp://notes.eatonphil.com/compiler-basics-lisp-to-assembly.html<p class="note">
Next in compiler basics:
<! forgive me, for I have sinned >
<br />
<a href="/compiler-basics-functions.html">2. user-defined functions and variables</a>
<br />
<a href="/compiler-basics-llvm.html">3. LLVM</a>
<br />
<a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a>
<br />
<a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a>
<br />
<a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a>
</p><p>In this post we'll write a simple compiler in Javascript (on Node)
without any third-party libraries. Our goal is to take an input
program like <code>(+ 1 (+ 2 3))</code> and produce an output assembly program
that does these operations to produce <code>6</code> as the exit code. The
resulting compiler can be found
<a href="https://github.com/eatonphil/ulisp">here</a>.</p>
<p>We'll cover:</p>
<ul>
<li>Parsing</li>
<li>Code generation</li>
<li>Assembly basics</li>
<li>Syscalls</li>
</ul>
<p>And for now we'll omit:</p>
<ul>
<li>Programmable function definitions</li>
<li>Non-symbol/-numeric data types</li>
<li>More than 3 function arguments</li>
<li>Lots of safety</li>
<li>Lots of error messsages</li>
</ul>
<h3 id="parsing">Parsing</h3><p>We pick the <a href="https://en.wikipedia.org/wiki/S-expression">S-expression</a>
syntax mentioned earlier because it's very easy to parse. Furthermore,
our input language is so limited that we won't even break our parser
into separate lexing/parsing stages.</p>
<p class="note">
Once you need to support string literals, comments, decimal
literals, and other more complex literals it becomes easier to use
separate stages.
<br />
<br />
If you're curious about these separate stages of parsing, you may be
interested in my post
on <a href="http://notes.eatonphil.com/writing-a-simple-json-parser.html">writing
a JSON parser</a>.
<br />
<br />
Or, check out my BSDScheme project for a fully-featured
<a href="https://github.com/eatonphil/bsdscheme/blob/master/src/lex.d">lexer</a>
and
<a href="https://github.com/eatonphil/bsdscheme/blob/master/src/parse.d">parser</a>
for Scheme.
</p><p>The parser should produce an Abstract Syntax Tree (AST), a data
structure representing the input program. Specifically, we want <code>(+ 1 (+ 2 3))</code>
to produce <code>['+', 1, ['+', 2, 3]]</code> in Javascript.</p>
<p>There are many different ways to go about parsing but the most
intuitive to me is to have a function that accepts a program (a
string) and returns a tuple containing the program parsed so far (an
AST) and the rest of the program (a string) that hasn't been
parsed.</p>
<p>That leaves us with a function skeleton that looks like this:</p>
<div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="nx">logic</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="nx">added</span><span class="w"> </span><span class="p">...</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>The code that initially calls parse will thus have to deal with
unwrapping the outermost tuple to get to the AST. For a more helpful
compiler we could check that the entire program <em>was</em> actually parsed
by failing if the second element of the return result is not the empty
string.</p>
<p>Within the function we will iterate over each character and accumulate
until we hit space, left or right parenthesis:</p>
<div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="kr">char</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'('</span><span class="o">:</span><span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">')'</span><span class="o">:</span><span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">' '</span><span class="o">:</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="kr">char</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>The recursive parts are always the most challenging. The right paren
is easiest. We must push the current token and return all tokens with
the rest of the program:</p>
<div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="kr">char</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'('</span><span class="o">:</span><span class="w"> </span><span class="c1">// TODO</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">')'</span><span class="o">:</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">)];</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">' '</span><span class="o">:</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="kr">char</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>Finally the left paren should recurse, add the parsed tokens to the
list of sibling tokens, and force the loop to start at the new
unparsed point.</p>
<div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span>
<span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o"><</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="kr">char</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">'('</span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">parsed</span><span class="p">,</span><span class="w"> </span><span class="nx">rest</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">));</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">parsed</span><span class="p">);</span>
<span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">rest</span><span class="p">;</span>
<span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">')'</span><span class="o">:</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">)];</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">' '</span><span class="o">:</span>
<span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">''</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="kr">char</span><span class="p">;</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">];</span>
<span class="p">};</span>
</pre></div>
<p>Assuming this is all in <code>parser.js</code>, let's try it out in the REPL:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node
><span class="w"> </span>const<span class="w"> </span><span class="o">{</span><span class="w"> </span>parse<span class="w"> </span><span class="o">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>require<span class="o">(</span><span class="s1">'./parser'</span><span class="o">)</span><span class="p">;</span>
undefined
><span class="w"> </span>console.log<span class="o">(</span>JSON.stringify<span class="o">(</span>parse<span class="o">(</span><span class="s1">'(+ 3 (+ 1 2)'</span><span class="o">)))</span><span class="p">;</span>
<span class="o">[[[</span><span class="s2">"+"</span>,3,<span class="o">[</span><span class="s2">"+"</span>,1,2<span class="o">]]]</span>,<span class="s2">""</span><span class="o">]</span>
</pre></div>
<p>Solid. We move on.</p>
<h3 id="assembly-101">Assembly 101</h3><p>Assembly is essentially the lowest-level programming language we can
use. It is a human readable, 1:1 representation of the binary
instructions the CPU can interpret. Conversion from assembly to
binary is done with an assembler; the reverse step is done with a
disassembler. We'll use <code>gcc</code> for assembling since it deals with some
<a href="http://fabiensanglard.net/macosxassembly/index.php">oddities</a> of
assembly programming on macOS.</p>
<p>The primary data structures in assembly are registers (temporary
variables stored by the CPU) and the program stack. Every function in
a program has access to the same registers, but convention cordons
off sections of the stack for each function so it ends up being a
slightly more durable store than registers. <code>RAX</code>, <code>RDI</code>, <code>RDX</code>, and
<code>RSI</code> are a few registers available to us.</p>
<p>Now we only need to know a few instructions to compile our program
(the rest of programming assembly is convention):</p>
<ul>
<li><code>MOV</code>: store one register's content into another, or store a literal number into a register</li>
<li><code>ADD</code>: store the sum of two register's contents in the first register</li>
<li><code>PUSH</code>: store a register's content on the stack</li>
<li><code>POP</code>: remove the top-most value from the stack and store in a register</li>
<li><code>CALL</code>: enter a new section of the stack and start running the function</li>
<li><code>RET</code>: enter the calling functions stack and return to evaluating from the next instruction after the call</li>
<li><code>SYSCALL</code>: like <code>CALL</code> but where the function is handled by the kernel</li>
</ul>
<h3 id="function-calling-convention">Function calling convention</h3><p>Assembly instructions are flexible enough that there is no
language-defined way to make function calls. Therefore it is important
to answer (at least) the following few questions:</p>
<ul>
<li>Where are parameters stored by the caller so that the callee has access to them?</li>
<li>Where is the return value stored by the callee so the caller has access to it?</li>
<li>What registers are saved by whom?</li>
</ul>
<p>Without getting too far into the specifics, we'll assume the following
answers for development on x86_64 macOS and Linux systems:</p>
<ul>
<li>Parameters are stored (in order) in the <code>RDI</code>, <code>RSI</code>, and <code>RDX</code> registers<ul>
<li>We won't support passing more than three arguments</li>
</ul>
</li>
<li>The return value is stored in the <code>RAX</code> register</li>
<li><code>RDI</code>, <code>RSI</code>, and <code>RDX</code> registers are stored by the caller</li>
</ul>
<h3 id="code-generation">Code generation</h3><p>With assembly basics and the function call convention in mind, we've
got enough to generate code from the parsed program's AST.</p>
<p>The skeleton of our compile code will look like this:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">code</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">map</span><span class="p">(()</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="s1">''</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">' '</span><span class="p">);</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">indent</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">compile_argument</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// If arg AST is a list, call compile_call on it</span>
<span class="w"> </span><span class="c1">// Else must be a literal number, store in destination register</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Save param registers to the stack</span>
<span class="w"> </span><span class="c1">// Compile arguments and store in param registers</span>
<span class="w"> </span><span class="c1">// Call function</span>
<span class="w"> </span><span class="c1">// Restore param registers from the stack</span>
<span class="w"> </span><span class="c1">// Move result into destination if provided</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Assembly prefix</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">emit_postfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Assembly postfix</span>
<span class="p">}</span>
<span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span>
<span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">));</span>
<span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span>
<span class="p">};</span>
</pre></div>
<p>From our pseudo-code in comments it is simple enough to fill in.
Let's fill in everything but the prefix and postfix code.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_argument</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// If arg AST is a list, call compile_call on it</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">destination</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="c1">// Else must be a literal number, store in destination register</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">BUILTIN_FUNCTIONS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s1">'+'</span><span class="o">:</span><span class="w"> </span><span class="s1">'plus'</span><span class="w"> </span><span class="p">};</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s1">'RDI'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RSI'</span><span class="p">,</span><span class="w"> </span><span class="s1">'RDX'</span><span class="p">];</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="c1">// Save param registers to the stack</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Compile arguments and store in param registers</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">compile_argument</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]));</span>
<span class="w"> </span><span class="c1">// Call function</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">BUILTIN_FUNCTIONS</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">fun</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// Restore param registers from the stack</span>
<span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=></span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span>
<span class="w"> </span><span class="c1">// Move result into destination if provided</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">''</span><span class="p">);</span><span class="w"> </span><span class="c1">// For nice formatting</span>
<span class="p">}</span>
</pre></div>
<p>In a better compiler, we would not make <code>plus</code> a built-in
function. We'd emit code for the assembly instruction <code>ADD</code>. However,
making <code>plus</code> a function makes code generation simpler and also allows
us to see what function calls look like.</p>
<p>We'll define the <code>plus</code> built-in function in the prefix code.</p>
<h3 id="the-prefix">The prefix</h3><p>Assembly programs consist of a few "sections" in memory. The most
important of which are the <code>text</code> and <code>data</code> sections. <code>text</code> is a
read-only section where the program instructions themselves are
stored. The CPU is instructed to start interpreting from some location
in this text section and it will increment through instructions,
evaluating each instruction until it reaches an instruction that tells
it to jump to a different location to evaluate instructions (e.g. with
CALL, RET, or JMP).</p>
<p>To denote the text section we emit <code>.text</code> in our prefix before we
emit our generated code.</p>
<p class="note">
The data section is for statically initialized values (e.g. global
variables). We don't have any need for that right now so we'll
ignore it.
<br />
<br />
<a href="https://www.cs.bgu.ac.il/~caspl122/wiki.files/lab2/ch07lev1sec6/ch07lev1sec6.html">Here</a>
is a good read with more detail on these (and other) sections.
</p><p>Additionally, we need to emit an entrypoint (we'll use <code>_main</code>) and
add a notice (<code>.global _main</code>) so that the location of this entrypoint
is visible externally. This is important because we let <code>gcc</code> handle
the hairier parts of generating an executable file and it needs access
to the entrypoint.</p>
<p>So far, our prefix looks like this:</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.global _main\n'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.text\n'</span><span class="p">);</span>
<span class="w"> </span><span class="c1">// TODO: add built-in functions</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'_main:'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>The last part of our prefix needs to include the <code>plus</code> built-in
function. For this, we add the first two parameter registers we agreed
on (<code>RDI</code> and <code>RSI</code>) and store the result in <code>RAX</code>.</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.global _main\n'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'.text\n'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'plus:'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'ADD RDI, RSI'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'MOV RAX, RDI'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'RET\n'</span><span class="p">);</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">'_main:'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And we're golden.</p>
<h3 id="the-postfix">The postfix</h3><p>The job of the postfix will be simple, call <code>exit</code> with the value of
<code>RAX</code> since this will be the result of the last function called by the
program.</p>
<p><code>exit</code> is a syscall, so we'll use the <code>SYSCALL</code> instruction to call
it. The x86_64 calling convention on macOS and Linux for <code>SYSCALL</code>
defines parameters the same way <code>CALL</code> does. But we also need to tell
<code>SYSCALL</code> what syscall to call. The convention is to set <code>RAX</code> to the
integer representing the syscall on the current system. On Linux it
will be <code>60</code>; on macOS it is <code>0x2000001</code>.</p>
<p class="note">
When I say "convention", I don't mean that you really have a choice
as a programmer. It was arbitrary when the operating system and
standard libraries chose it. But if you want to write a working
program that uses syscalls or calls out to (say) glibc, you'll need
to follow these conventions.
</p><p>The postfix then looks like this:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">os</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'os'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALL_MAP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">platform</span><span class="p">()</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">'darwin'</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s1">'exit'</span><span class="o">:</span><span class="w"> </span><span class="s1">'0x2000001'</span><span class="p">,</span>
<span class="p">}</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="s1">'exit'</span><span class="o">:</span><span class="w"> </span><span class="mf">60</span><span class="p">,</span>
<span class="p">};</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">emit_postfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'MOV RDI, RAX'</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set exit arg</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="s1">'exit'</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set syscall number</span>
<span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">'SYSCALL'</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>And we're set here too.</p>
<h3 id="putting-it-all-together">Putting it all together</h3><p>We can finally write our Javascript entrypoint and run our compiler
against a sample program.</p>
<p>The entrypoint might look like this:</p>
<div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./parser'</span><span class="p">);</span>
<span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">compile</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">'./compiler'</span><span class="p">);</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">script</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span><span class="p">[</span><span class="mf">2</span><span class="p">];</span>
<span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">ast</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">script</span><span class="p">);</span>
<span class="w"> </span><span class="nx">compile</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">]);</span>
<span class="p">}</span>
<span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">);</span>
</pre></div>
<p>And we can call it like so:</p>
<div class="highlight"><pre><span></span><span class="nf">$</span><span class="w"> </span><span class="nv">node</span><span class="w"> </span><span class="nv">ulisp.js</span><span class="w"> </span><span class="s">'(+ 3 (+ 2 1))'</span>
<span class="w"> </span><span class="nf">.global</span><span class="w"> </span><span class="nv">_main</span>
<span class="w"> </span><span class="nf">.text</span>
<span class="nl">plus:</span>
<span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">RET</span>
<span class="nl">_main:</span>
<span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RSI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span>
<span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RSI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span>
<span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RSI</span>
<span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="nb">RAX</span>
<span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span>
<span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RSI</span>
<span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RDI</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RAX</span>
<span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mh">0x2000001</span>
<span class="w"> </span><span class="nf">SYSCALL</span>
</pre></div>
<h3 id="generating-an-executable-file-from-the-output">Generating an executable file from the output</h3><p>If we redirect the previous output to an assembly file and call <code>gcc</code>
on it, we can generate a program we can run. Then we can echo the <code>$?</code>
variable to see the exit code of the previous process.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span><span class="s1">'(+ 3 (+ 2 1))'</span><span class="w"> </span>><span class="w"> </span>program.S
$<span class="w"> </span>gcc<span class="w"> </span>-mstackrealign<span class="w"> </span>-masm<span class="o">=</span>intel<span class="w"> </span>-o<span class="w"> </span>program<span class="w"> </span>program.s
$<span class="w"> </span>./program
$<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span>
<span class="m">6</span>
</pre></div>
<p>And we've got a working compiler! The full source of the compiler is
available <a href="https://github.com/eatonphil/ulisp">here</a>.</p>
<h3 id="further-reading">Further reading</h3><ul>
<li><a href="https://aaronbloomfield.github.io/pdr/book/x86-64bit-ccc-chapter.pdf">x86_64 calling convention</a></li>
<li>macOS assembly programming<ul>
<li><a href="http://fabiensanglard.net/macosxassembly/index.php">Stack alignment on macOS</a></li>
<li><a href="https://filippo.io/making-system-calls-from-assembly-in-mac-os-x/">Syscalls on macOS</a></li>
</ul>
</li>
<li>Destination-driven code generation<ul>
<li><a href="https://www.cs.indiana.edu/~dyb/pubs/ddcg.pdf">Kent Dybvig's original paper</a></li>
<li><a href="http://cs.au.dk/~mis/dOvs/slides/46b-codegeneration-in-V8.pdf">One-pass code generation in V8</a></li>
</ul>
</li>
</ul>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Finished that intro to compilers post :) lisp to assembly in Javascript <a href="https://t.co/0HDIn4Mv7a">https://t.co/0HDIn4Mv7a</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1066863077000441856?ref_src=twsrc%5Etfw">November 26, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/compiler-basics-lisp-to-assembly.htmlTue, 20 Nov 2018 00:00:00 +0000
- On NYC, Tokyo and Seoulhttp://notes.eatonphil.com/on-nyc-tokyo-and-seoul.html<p>I’ve lived in NYC for the past year — moved here after years in Philly
and after growing up in a rural community a few hours west of
there. My wife is South Korean and last week concluded my second trip
to the suburbs of Seoul to visit her family. We finished up that trip
with a week in Tokyo.</p>
<p>Long a mecha and Godzilla fan, I was struck by a city not
significantly more modern, or significantly more “Eastern”, than
NYC. In contrast, the lesser known Seoul is more modern than both
cities and shares as much “Eastern” vibe as Tokyo.</p>
<p>I’d go so far as to say that Seoul is the most livable of the three
for anyone of a similar background. There are a few concrete areas
that led me to this including transportation, apartments, WiFi/cafes,
food, and language.</p>
<p>I'll conclude with a few tourist recommendations and a list of books
to read on South Korea and Japan if you share my enthusiasm for
comparing life in different cities.</p>
<h3 id="transportation">Transportation</h3><p>NYC is one of the few cities in the world with a subway that runs
24/7. Tokyo and Seoul do not share this trait despite being many
decades newer. (Tokyo and Seoul were heavily damaged during World War
II and the Korean War, respectively.) And despite being built later,
Tokyo subway cars are even less wide than NYC subway cars (~8.2ft
vs. ~8.5ft).</p>
<p>In contrast, Seoul subway cars are ~10.2ft wide. The difference may
seem slight but it is noticeable during rush hour when in Seoul there
is space for four people to stand in the aisle versus room for perhaps
two in a Tokyo or NYC subway car.</p>
<p><img src="https://photos.travelblog.org/Photos/10223/428861/f/4174039-Seoul-subway-car-0.jpg" alt="Seoul subway car" />
<small>Seoul subway car, source: Travel Blog</small></p>
<p>The Seoul subway system is also the most advanced in terms of
safety. All stations have a floor-to-ceiling barrier with doors that
only open when a train arrives. Most stations in Tokyo have a ~3ft
tall barrier that does the same, though some stations have no
barrier. In NYC there are no barriers anywhere.</p>
<p>Concerning innovation, Seoul and Tokyo both have multiple driverless
subway lines whereas NYC has none. But in terms of complexity the NYC
subway is the simplest because you pay only once. Seoul and Tokyo
subways are slightly more complex in that you swipe your card when you
enter and exit (or transfer).</p>
<h4 id="taxis">Taxis</h4><p>It was jarring to be greeted by the very 90s, vaguely British Toyota
Crown taxi cabs that dominate the streets of Tokyo.</p>
<p><img src="https://i.imgur.com/WuIHqxY_d.jpg?maxwidth=640&shape=thumb&fidelity=medium" alt="Toyota Crown cab" />
<small>Source: Phil Eaton</small></p>
<p>These cabs have no integrated navigation unit but a modern unit was
typically mechanically attached. We saw a few of the recently approved
Toyota JPN Taxi, but they only account for around <a href="https://www.japantimes.co.jp/news/2018/05/23/business/taxi-tokyo-prepares-olympic-tourism-boom-accessible-cabs-international-drivers/">10
percent</a>
of cabs. (The integrated navigation is massive, perhaps 10-inch
screens.) In contrast, Seoul has a
<a href="http://travel.cnn.com/seoul/life/seoul-taxi-guide-783378/">variety</a>
of modern cabs all with integrated navigation — the most common of
which is the Hyundai Sonata.</p>
<p><img src="http://www.theseoulguide.com/wp-content/uploads/2013/09/regular_orange_taxi_in_seoul.jpg" alt="Hyundai Sonata cab" />
<small>Source: The Seoul Guide</small></p>
<p>Although Japanese car companies
<a href="https://www.motortrend.com/news/12q2-1993-eunos-mazda-cosmo-drive/">pioneered</a>
integrated navigation in the 90s, it appears to have been the standard
for South Korean car companies for the past 10-20 years.</p>
<p>And then there’s NYC with its primary mix of Crown Victorias and
Priuses with multiple 4-inch smartphones mechanically attached for
navigation.</p>
<p><img src="https://thenypost.files.wordpress.com/2013/10/cab2.jpg?quality=90&strip=all" />
<small>Source: New York Post</small></p>
<h3 id="living">Living</h3><p>South Korea has no concept of the suburb oriented around single-family
houses. Drive an hour or two out from Seoul or Busan and see the same
massive, modern apartment complexes that are found in the city
center. After that it's the stark farms of Kansas. Japan appears more
like the US in that the city graduates steadily to suburb and farm.</p>
<p><img src="https://cdn.japantimes.2xx.jp/wp-content/uploads/2013/09/wn20130918n2a-870x580.jpg" alt="Apartments in Seoul" />
<small>Apartments in Seoul, source: Japan Times</small></p>
<p>In general, buildings in South Korea are fairly homogeneous. Even the
downtown areas of Seoul have little architectural creativity. Tokyo
and NYC are both diverse in building styles and sizes. However, NYC
takes the cake for ubiquity of massive towers. In fact, the first time
my South Korean father-in-law visited Manhattan he was blown away by
this mass.</p>
<p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8e/Manhattan_Skyline_night.jpg/800px-Manhattan_Skyline_night.jpg" alt="Manhattan skyline">
<small>New York City, source: Wikipedia</small></p>
<p>The most popular neighborhoods in Tokyo seem more developed than their
Seoul counterparts, the mass of stores and crowds extends further. And
while the average age of buildings in Tokyo seems younger than the
average age of buildings throughout Seoul (including less desirable
areas), the developed areas (including buildings and streets) of Seoul
are significantly cleaner and more modern. In contrast, and on average,
Tokyo buildings seem as old as NYC buildings.</p>
<p><img src="https://cdn.fodors.com/wp-content/uploads/2018/02/Tokyo-Neighborhoods-Along-Arakawa-Streetcar-1.jpg" />
<small>Tokyo, source: Fodors</small></p>
<h4 id="air-quality">Air quality</h4><p>Air quality in <a href="https://www.numbeo.com/pollution/in/New-York">NYC</a> and
<a href="https://www.numbeo.com/pollution/in/Tokyo">Tokyo</a> is high, pollution
is low. But in <a href="https://www.upi.com/Fine-dust-levels-soar-in-South-Korea/5581523776231/">recent
times</a>,
air quality in Seoul has deteriorated with dangerous levels of fine
dust from factories in South Korea and China. It is not clear when or
how the South Korean government will address this.</p>
<h3 id="wifi/cafes">WiFi/Cafes</h3><p>My idea of a good cafe is a decent ratio of seats to traffic,
available electrical outlets, and decent WiFi. NYC and Tokyo have some
similarities: chain coffee shops are larger and non-chains are often
pretty small. Tokyo differs from NYC in that there are few electrical
outlets and in the existence of interior smoking sections. (Tokyo bans
smoking while walking but designates areas like parks or inner
rooms in restaurants or cafes.)</p>
<p>But the WiFi in Tokyo is abysmal. Many cafes do not have it (though
the trend is to provide) and even the chains that do provide it have
terrible speeds reaching peaks of 5Mbps down. In NYC WiFi is available
near ~20Mbps down at most chains and ~5Mpbs at smaller non-chains.</p>
<p>In contrast, South Korea is the jewel of cafe culture. Unlike how in
the US coffee shop size decreases as population increases, coffee shop
sizes in South Korea are oddly enormous everywhere. South Korea is
rich with local shops, domestic chains (including the exported Paris
Baguette and Tous Les Jours), and foreign chains (South Korea has the
highest number of Starbucks Reserve stores per capita of any country).</p>
<p><img src="https://file.mk.co.kr/meet/neds/2018/06/image_readtop_2018_402044_15299876083365412.jpg" alt="" />
<small>Starbucks Reserve in Seoul, source: Pulse News</small></p>
<p>From Jeju Island to Seoul we never worried about a seat or an outlet
at a cafe. Furthermore, the WiFi in South Korea is incredible. My
tech-hopeless in-law’s basic internet plan got 80Mbps down and the
small cafes near their apartment got at least 40Mbps down.</p>
<p>NYC falls closer to Seoul in terms of ubiquity and speed of WiFi and
has the added benefit of fast city-provided, outdoor WiFi surprisingly
fast and available throughout the city. NYC is much worse in terms of
daylight. Most cafes close between 8-10pm whereas cafes in Seoul and
Tokyo easily stay open past 11pm.</p>
<h4 id="caveat">Caveat</h4><p>It’s not exactly fair to exclude internet cafes, prevalent in both
Seoul and Tokyo (oddly even NYC has a
<a href="https://www.google.com/maps?q=nyc+internet+cafe&um=1&ie=UTF-8&sa=X&ved=0ahUKEwjE0-PxuZTeAhWTdXAKHYDFB1cQ_AUIDigB">few</a>). At
an internet cafe in Tokyo you can expect abundant outlets and
excellent WiFi (I saw peaks of 40Mbps down). I did not visit an
internet cafe in Seoul but I expect it to be similar. In both Seoul
and Tokyo you can easily find 24/7 service (with showers!?).</p>
<p>I did not include internet cafes above because I find them slightly
less convenient for tourists. Though credit is due: unlike American
Chinatown internet cafes, the ones we visited in Tokyo were very
clean, spacious and warm.</p>
<p><img src="http://rakutama.com/en/images/shop/koriyama.jpg" alt="Internet cafe in Shinjuku" />
<small>Internet cafe in Shinjuku, source: Rakutama</small></p>
<h3 id="food">Food</h3><p>Dining out in NYC is similar in cost to other major US cities. The
quality is usually pretty good. Tokyo was about as expensive as food
in NYC and generally as high quality. For instance, most dinners in
NYC and Tokyo cost about $40-60 for two people. In contrast, most
entrees in Seoul are sold for two and the dinner in total was often
about $20-40. Restaurants on average seemed to be lower quality in
Seoul compared to Tokyo and New York, but there are still more than
enough high quality options.</p>
<h3 id="language">Language</h3><p>I am biased having a better knowledge of Korean than Japanese and a
South Korean partner to fall back on. But I believe South Korea is the
more friendly place for an English speaker in that it is more
dedicated to providing English translations and that the written
language is simpler. In both cities the penetration of
English-speaking natives (and quality of speech and comprehension) is
indistinguishable and decent.</p>
<p>To the first point, even the oddest locations and obscure signage had
English translations in South Korea (not just Seoul) — not so even
within Tokyo.</p>
<p>To the second point, Japanese has three writing systems (kanji,
hiragana, and katakana). Kanji (characters originating from Chinese)
cannot be replaced in writing by phonetic counterparts in hiragana or
katakana. So you have little choice but to memorize all important
characters, disregarding the fact that many characters can be broken
down. Then you must also memorize the alphabetic systems of hiragana
and katakana.</p>
<p>In contrast, Korean has two writing systems (hangul and hanja) where
hanja (characters originating from Chinese) is primarily used in
formal settings (government forms, academic books, etc.) and can be
replaced with the phonetic equivalent in hangul.</p>
<p>This makes it much simpler to memorize and read Korean compared to
Japanese.</p>
<h3 id="assorted-recommendations">Assorted recommendations</h3><p>For New Yorkers, don’t stay in the recommended areas of
Shinjuku/Shibuya/Roppongi unless you’re the type who’d enjoy staying
around Times Square. These three areas of Tokyo are just as obnoxious
albeit much safer. I also don’t recommend the Harajuku area; it is
extra. There’s no real equivalent level of crazy in Seoul although
Hongdae comes close.</p>
<p>In a future Tokyo trip I’d stick to the Meguro Station area including
Ebisu and Daikanyama. They are beautiful, quiet neighborhoods with
lots of restaurants and cafes beside the Meguro river. Areas along the
Sumida River are also beautiful and quiet. Ginza/Tokyo Station is
also a fun-but-not-obnoxious area to visit.</p>
<p><img src="https://odis.homeaway.com/odis/listing/f3fd8dfd-c29e-4ab3-a0cd-19a99bdc3c7f.c10.jpg" alt="Ebisu">
<small>Ebisu, source: Homeaway</small></p>
<p>I cannot recommend the Edo-Tokyo Museum enough, it is the best city
museum I've visited. Tsukiji is also a must see, reminding me how much
I miss going to Reading Terminal Market each weekend in Philly.</p>
<p>In Seoul I’d recommend Yeonnam-Dong, Itaewon (which is much nicer than
it’s made out to be), and Gwanghwamun. Mapo-Gu in general is a great
region of Gangbuk as is the area below it (near Yeouido) in
Gangnam.</p>
<p><img src="https://i.imgur.com/ttdg5Y7.jpg?maxwidth=640" alt="Yeonnam-dong" />
<small>Yeonnam-Dong, source: Phil Eaton</small></p>
<p>I recommend visiting the National Museum of Korea in Seoul as well as
Hangang Park and Gyeongui Line Forest Park. The areas around the
Tancheon stream flowing South to Bundang are also beautiful.</p>
<p><img src="https://misadventuresofanawkwardamerican.files.wordpress.com/2014/05/dscn05912.jpg" alt="Tancheon near Bundang" />
<small>Tancheon near Bundang, source: Misadventures of an Awkward American</small></p>
<h3 id="conclusion">Conclusion</h3><p>I came to Tokyo with the expectation of a highly modern city fused
with Eastern culture. But it is difficult to see many ways it is ahead
of NYC technically and it is very similar to NYC culturally. In some
ways Tokyo even seems a little stuck in the past or just... off. Why
are all vending machines [e.g. for tickets, ordering food, etc.]
mechanical and not touch screens? The National Museum of Science is
awfully old and ugly, the National Diet Building the same.</p>
<p>So on the one hand I’d like to let the next person down lightly on the
excitement of Japan. It is a world-class city with great restaurants,
live music and refined culture but all-in-all very similar to NYC. On
the other hand I recommend Seoul for a cheaper, cleaner, more
English-speaker friendly, and genuinely novel city with splashes of
"Eastern" romantic elements like Tokyo.</p>
<p><img src="http://www.englishspectrum.com/wp-content/uploads/2015/03/yeoido.JPG-1.jpg" alt="Cherry blossoms in Seoul" />
<small>Cherry blossoms in Seoul, source: English Spectrum</small></p>
<h3 id="further-reading">Further reading</h3><p><a href="https://amzn.to/2PNOsih">MITI and the Japanese Miracle: The Growth of Industrial Policy,
1925-1975</a> is an excellent, albeit somewhat
disputed introduction to the modern Japanese economy.</p>
<p><a href="https://amzn.to/2EIw6hc">Asia’s Next Giant: South Korea and Late
Industrialization</a> is a similar high-quality
introduction to the South Korean economy.</p>
<p>If you’re only familiar with
US/Canadian companies or other “pure” market economies these two books
are a great read on different, challenging styles of government
policy, corporate structure, and life.</p>
<p class="note">
P.s. I’m looking for book recommendations on the last 20 years of
economic/political history in Japan and South Korean and on the last
100 years of economic/political history in the US and NYC.
</p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a few points of comparison between <a href="https://twitter.com/hashtag/nyc?src=hash&ref_src=twsrc%5Etfw">#nyc</a>, <a href="https://twitter.com/hashtag/seoul?src=hash&ref_src=twsrc%5Etfw">#seoul</a>, and <a href="https://twitter.com/hashtag/tokyo?src=hash&ref_src=twsrc%5Etfw">#tokyo</a> after finishing a recent trip. <a href="https://t.co/oKo4YlTZV3">https://t.co/oKo4YlTZV3</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1053645222402416641?ref_src=twsrc%5Etfw">October 20, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/on-nyc-tokyo-and-seoul.htmlSat, 20 Oct 2018 00:00:00 +0000
- Why (and how) to read bookshttp://notes.eatonphil.com/why-and-how-to-read-books.html<p>The last time I read for fun was in elementary school. Since college I
knew I must read more, but I never forced myself to build the
habit. Then three years ago I spent time around my brother and a
coworker who were avid readers. This "peer pressure" helped me get
started.</p>
<p>Since I started, I've seen concrete improvements in vocabulary. I find
myself using words I didn't know I knew. I question my choice of words
more. And I understand coworkers a little better. Perhaps it is only
personal style, but I've also become more aware of hyperbole in my
speech and have begun to tone that down.</p>
<p>Specifically, books provide more density of information than I can
pull together myself. I've also benefited heavily from reading books
on tools I use daily. Contrary to being boring, a book on a topic with
which I'm familiar has been a (often needed) break from books on
topics with which I am unfamiliar. The former category might include
books on CSS, Bash, Emacs, Python, Scheme, data modeling,
Linux/FreeBSD system administration, mystery novels, and so on. The
latter category might include books on Common Lisp, system
architecture, the implementation of Linux/FreeBSD, behavioral
psychology, management, stock/bond markets, the history of
Argentina/Chile/South Korea/Japan, sci-fi novels, and so on.</p>
<p>Reading diversely exposes how little I know. And that can be
depressing. But I'm fairly confident reading books is the fastest way
to grow.</p>
<p>Tactically speaking, I started slowly with few books and the ones
easiest for me to read. The first year I read two books, both
technical. The second year I read nine books and was able to start
branching out beyond technical books. Last year I read a much more
diverse set of forty books. And this year I followed suit with
forty-one so far (on track for fifty-five or so).</p>
<p>I keep track of books I'm reading and books I want to read in
<a href="https://www.goodreads.com/eatonphil">Goodreads</a>. I particularly enjoy
their reading challenge system that lets you know if you are on track
to meet your reading goal for the year.</p>
http://notes.eatonphil.com/why-and-how-to-read-books.htmlWed, 26 Sep 2018 00:00:00 +0000
- Compiling dynamic programming languageshttp://notes.eatonphil.com/compiling-dynamic-programming-languages.html<p>It can be difficult to disassociate the idea that dynamically typed
programming languages are tied to byte-code interpreters (e.g. YARV
Ruby, CPython, V8, Zend Engine, etc.). But for many languages, a
compiled implementation also exists. Cython, Chicken Scheme and SBCL
are good examples.</p>
<p>In this post I will briefly describe how I built a compiler for my
<a href="https://github.com/eatonphil/bsdscheme">Scheme implementation</a> using
artifacts from the interpreter. In doing this, I learned a simple (not
novel) technique for compiling dynamic languages. I'll introduce the
<a href="https://github.com/eatonphil/jsc">Javascript to C++/V8 compiler</a> I
am developing using this technique.</p>
<h3 id="bsdscheme">BSDScheme</h3><p>For the past year I've developed a Scheme implementation,
<a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a>. I started with an
AST-interpreter (as opposed to a byte-code compiler and VM). A more
detailed blog post on the first few steps writing BSDScheme can be
found
<a href="http://notes.eatonphil.com/first-few-hurdles-writing-a-scheme-interpreter.html">here</a>.</p>
<p>As I built up support for the various objects and operations in the
language, I had a sizeable base of D code for the BSDScheme
runtime. This included an object representation for primitive types
(and support for converting to and from types in D) as well as basic
Scheme operations
(<code>+</code>, <code>-</code>, <code>car</code>, <code>cdr</code>,
etc.).</p>
<p>When the time came to implement a compiler backend, I only needed to
do codegen since the parser already existed. Furthermore, the
fundamental bits had already been written: object representation and
much of the standard library. So I wrote the simplest compiler I could
think of by targeting D and the objects/functions I had already
written to support the interpreter.</p>
<p>Take, for example, the <code>equals</code>
<a href="https://github.com/eatonphil/bsdscheme/blob/master/src/common.d#L140">function</a>
in the standard library:</p>
<div class="highlight"><pre><span></span><span class="n">Value</span><span class="w"> </span><span class="nf">equals</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="o">**</span><span class="w"> </span><span class="n">rest</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tuple</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">car</span><span class="p">(</span><span class="n">tuple</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">b</span><span class="p">;</span>
<span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">tagOfValue</span><span class="p">(</span><span class="n">left</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Integer</span><span class="p">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsInteger</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">valueToInteger</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToInteger</span><span class="p">(</span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Char</span><span class="p">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsChar</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">valueToChar</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToChar</span><span class="p">(</span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">String</span><span class="p">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsString</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">valueToString</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToString</span><span class="p">(</span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Symbol</span><span class="p">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsSymbol</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Function</span><span class="p">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsFunction</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">valueToFunction</span><span class="p">(</span><span class="n">left</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToFunction</span><span class="p">(</span><span class="n">right</span><span class="p">)[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Bool</span><span class="p">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsBool</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">valueToBool</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToBool</span><span class="p">(</span><span class="n">right</span><span class="p">);</span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span>
<span class="w"> </span><span class="k">default</span><span class="o">:</span>
<span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">makeBoolValue</span><span class="p">(</span><span class="n">b</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<p>So long as my compiler generated code that used the <code>Value</code>
object to represent Scheme data, I already had an <code>equals</code>
function and large swaths of a Scheme standard library that I could
share between the compiler and interpreter.</p>
<p>Ultimately I only needed to implement a few control structures to
support compiling a large subset of what I supported in the
interpreter. The key aspects here include: function definitions (in
D), function calls (D function calls), if/else (if/else in D) and so
on.</p>
<p>To give a concrete example of a whole program compiled, this Scheme program:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nb">exp</span><span class="w"> </span><span class="nv">base</span><span class="w"> </span><span class="nv">pow</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="nv">pow</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span>
<span class="w"> </span><span class="mi">1</span>
<span class="w"> </span><span class="p">(</span><span class="nb">*</span><span class="w"> </span><span class="nv">base</span><span class="w"> </span><span class="p">(</span><span class="nb">exp</span><span class="w"> </span><span class="nv">base</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">pow</span><span class="w"> </span><span class="mi">1</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">main</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">display</span><span class="w"> </span><span class="p">(</span><span class="nb">exp</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">16</span><span class="p">))</span>
<span class="p">(</span><span class="nb">newline</span><span class="p">))</span>
</pre></div>
<p>when run through the BSDScheme compiler would become:</p>
<div class="highlight"><pre><span></span><span class="k">import</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">stdio</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="n">lex</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="n">common</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="n">parse</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="n">utility</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="n">value</span><span class="p">;</span>
<span class="k">import</span><span class="w"> </span><span class="n">buffer</span><span class="p">;</span>
<span class="n">Value</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="o">**</span><span class="w"> </span><span class="n">ctx</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Value</span><span class="p">[]</span><span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">listToVector</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmp</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">pow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmp</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">equals_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">equals</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">pow</span><span class="p">,</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">0</span><span class="p">)]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">if_result</span><span class="p">;</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">truthy</span><span class="p">(</span><span class="n">equals_result</span><span class="p">))</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="n">if_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">minus_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">minus</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">pow</span><span class="p">,</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">1</span><span class="p">)]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">exp_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exp</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">minus_result</span><span class="p">]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">times_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">exp_result</span><span class="p">]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="n">if_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times_result</span><span class="p">;</span>
<span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">if_result</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Value</span><span class="w"> </span><span class="nf">BSDScheme_main</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="o">**</span><span class="w"> </span><span class="n">ctx</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">exp_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exp</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">16</span><span class="p">)]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">display_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">display</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">exp_result</span><span class="p">]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">newline_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newline</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">newline_result</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">BSDScheme_main</span><span class="p">(</span><span class="n">nilValue</span><span class="p">,</span><span class="w"> </span><span class="n">cast</span><span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="p">}</span>
</pre></div>
<p>Where <em>every imported function had already been written for the
interpreter</em>. I had only to translate a few lines to D and import/call
these existing libraries. Now I had a small <em>binary</em> of compiled
Scheme.</p>
<p>It was at this point I realized I was using the same technique used by
Cython to compile Python code.</p>
<p class="note">
...the Cython project has approached this problem by means of a
source code compiler that translates Python code to equivalent C
code. This code is executed within the CPython runtime environment,
but at the speed of compiled C and with the ability to call directly
into C libraries.
<a href="http://docs.cython.org/en/latest/src/quickstart/overview.html">
http://docs.cython.org/en/latest/src/quickstart/overview.html
</a>
</p><h3 id="jsc">jsc</h3><p>I played with many PL-research-y languages over the years and wanted
to do build something a little more practical. So I took what I
learned writing the BSDScheme compiler and decided to write a
Javascript compiler. Specifically, it would target the easiest backend
I could imagine: C++ using the V8 C++ library and generating a Node
addon.</p>
<p>There already existed well-trodden guides/means of writing Node addons
in C++ so I spent some time trying to hand-compile simple Javascript
programs to C++ and V8. A string in Javascript would become a
<code>v8::String</code> type in C++. A number in Javascript would become
<code>v8::Number</code> in C++ and so forth.</p>
<p>I decided to write this compiler in Rust given its roots in (and my
familiarity with) ML and Python. I found a <a href="https://github.com/dherman/esprit">Javascript parser by Dave
Herman</a> and after a few lazy weeks
finally got a "Hello world!" program compiling. Getting my first
program to compile has by far been the hardest part of building jsc.</p>
<p>Let's look at a concrete example of a recursive fibonacci program
(example/recursion.js in the
<a href="https://github.com/eatonphil/jsc">repo</a>):</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mf">20</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<p>Let's add a call to <code>main()</code> at the end and time this with
Node to get a baseline:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>example/recursion.js
<span class="m">6765</span>
node<span class="w"> </span>example/recursion.js<span class="w"> </span><span class="m">0</span>.06s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.02s<span class="w"> </span>system<span class="w"> </span><span class="m">97</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.083<span class="w"> </span>total
</pre></div>
<p>Now let's install jsc to compare. We'll need Rust, Cargo, Node and
Node-GYP.</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https:/github.com/eatonphil/jsc
$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>jsc
$<span class="w"> </span>make<span class="w"> </span><span class="o">&&</span><span class="w"> </span>make<span class="w"> </span>install
$<span class="w"> </span>jsc<span class="w"> </span>example/recursion.js
</pre></div>
<p>jsc produces a Javascript entrypoint that imports our addon
(build/recursion.js):</p>
<div class="highlight"><pre><span></span><span class="nx">require</span><span class="p">(</span><span class="s2">"./build/Release/recursion"</span><span class="p">).</span><span class="nx">jsc_main</span><span class="p">();</span>
</pre></div>
<p>And it produces a C++ file that represents the entire program
(build/recursion.cc):</p>
<div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf"><string></span>
<span class="cp">#include</span><span class="w"> </span><span class="cpf"><node.h></span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Boolean</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Context</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Exception</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Function</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionTemplate</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionCallbackInfo</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Isolate</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Local</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Null</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Number</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Object</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">String</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">False</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">True</span><span class="p">;</span>
<span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Value</span><span class="p">;</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">fib</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">>&</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="nl">tail_recurse_1</span><span class="p">:</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Context</span><span class="o">></span><span class="w"> </span><span class="n">ctx_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">global_3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ctx_2</span><span class="o">-></span><span class="n">Global</span><span class="p">();</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">Boolean_4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">global_3</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"Boolean"</span><span class="p">)));</span>
<span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_5</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_6</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_5</span><span class="p">);</span>
<span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_7</span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">));</span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_8</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_7</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_9</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">i</span><span class="o">-></span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">string_tmp_6</span><span class="w"> </span><span class="o"><=</span><span class="w"> </span><span class="n">string_tmp_8</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">)))</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Boolean_4</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_9</span><span class="p">);</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">result_10</span><span class="o">-></span><span class="n">ToBoolean</span><span class="p">()</span><span class="o">-></span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">FunctionTemplate</span><span class="o">></span><span class="w"> </span><span class="n">ftpl_13</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_12</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_13</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">();</span>
<span class="w"> </span><span class="n">fn_12</span><span class="o">-></span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"fib"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_14</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_15</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_12</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_14</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">FunctionTemplate</span><span class="o">></span><span class="w"> </span><span class="n">ftpl_18</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_17</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_18</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">();</span>
<span class="w"> </span><span class="n">fn_17</span><span class="o">-></span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"fib"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_19</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_20</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_17</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_19</span><span class="p">);</span>
<span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">((</span><span class="n">result_15</span><span class="o">-></span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">result_20</span><span class="o">-></span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">Concat</span><span class="p">(</span><span class="n">result_15</span><span class="o">-></span><span class="n">ToString</span><span class="p">(),</span><span class="w"> </span><span class="n">result_20</span><span class="o">-></span><span class="n">ToString</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">>::</span><span class="n">Cast</span><span class="p">((</span><span class="n">result_15</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">result_20</span><span class="o">-></span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">result_15</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">result_20</span><span class="o">-></span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-></span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Number</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">))));</span>
<span class="w"> </span><span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">jsc_main</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o"><</span><span class="n">Value</span><span class="o">>&</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span>
<span class="nl">tail_recurse_21</span><span class="p">:</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_22</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">FunctionTemplate</span><span class="o">></span><span class="w"> </span><span class="n">ftpl_24</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_23</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_24</span><span class="o">-></span><span class="n">GetFunction</span><span class="p">();</span>
<span class="w"> </span><span class="n">fn_23</span><span class="o">-></span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"fib"</span><span class="p">));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_25</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_22</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_26</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_23</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_25</span><span class="p">);</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">arg_27</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result_26</span><span class="p">;</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">></span><span class="w"> </span><span class="n">fn_28</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Function</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">>::</span><span class="n">Cast</span><span class="p">(</span><span class="n">isolate</span><span class="o">-></span><span class="n">GetCurrentContext</span><span class="p">()</span><span class="o">-></span><span class="n">Global</span><span class="p">()</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"console"</span><span class="p">)))</span><span class="o">-></span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">"log"</span><span class="p">)));</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">argv_29</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_27</span><span class="w"> </span><span class="p">};</span>
<span class="w"> </span><span class="n">Local</span><span class="o"><</span><span class="n">Value</span><span class="o">></span><span class="w"> </span><span class="n">result_30</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_28</span><span class="o">-></span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_29</span><span class="p">);</span>
<span class="w"> </span><span class="n">result_30</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="w"> </span><span class="nf">Init</span><span class="p">(</span><span class="n">Local</span><span class="o"><</span><span class="n">Object</span><span class="o">></span><span class="w"> </span><span class="n">exports</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="n">NODE_SET_METHOD</span><span class="p">(</span><span class="n">exports</span><span class="p">,</span><span class="w"> </span><span class="s">"jsc_main"</span><span class="p">,</span><span class="w"> </span><span class="n">jsc_main</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">NODE_MODULE</span><span class="p">(</span><span class="n">NODE_GYP_MODULE_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">Init</span><span class="p">)</span>
</pre></div>
<p>Let's time this version:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>build/recursion.js
<span class="m">6765</span>
node<span class="w"> </span>build/recursion.js<span class="w"> </span><span class="m">0</span>.16s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.03s<span class="w"> </span>system<span class="w"> </span><span class="m">107</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.175<span class="w"> </span>total
</pre></div>
<p>jsc, over twice as slow, is already falling behind Node. :)</p>
<p>As I incremented the number passed to my fibonacci function the
compiled program time to completion get exponentially worse. Node
stayed the same. I decided to try tail-call optimization to decrease
the performance distance between Node and jsc.</p>
<p>I implemented tail-call optimization for the interpreter in BSDScheme
by putting all functions in a loop that would break if tail-call
elimination was not to happen. It took me a week to implement this and
I never put it in place for the compiler. This time around I was able
to add basic tail call elimination to jsc in two hours. It is done
by <code>label</code>s and <code>goto</code>s instead of a tail call
when applicable.</p>
<p>Here is a tail-call optimized version of the same program
(example/tco.js):</p>
<div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span>
<span class="w"> </span><span class="p">}</span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">);</span>
<span class="p">}</span>
<span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span>
<span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mf">50</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">));</span>
<span class="p">}</span>
</pre></div>
<p>We add a call to <code>main()</code> again for Node and time it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>example/tco.js
<span class="m">12586269025</span>
node<span class="w"> </span>example/tco.js<span class="w"> </span><span class="m">0</span>.06s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.02s<span class="w"> </span>system<span class="w"> </span><span class="m">96</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.080<span class="w"> </span>total
</pre></div>
<p>And compile it with jsc and time it:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>jsc<span class="w"> </span>example/tco.js
$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>build/tco.js
<span class="m">12586269025</span>
node<span class="w"> </span>build/tco.js<span class="w"> </span><span class="m">0</span>.07s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.02s<span class="w"> </span>system<span class="w"> </span><span class="m">95</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.087<span class="w"> </span>total
</pre></div>
<p>Well that's not bad at all. :)</p>
<h3 id="next-steps-with-jsc">Next steps with jsc</h3><p>jsc has very limited support for... everything. Today I added almost
all primitive numeric operations + equality/inequality operations +
unit tests. jsc does not yet support nested functions, callbacks, or
closures. It supports <code>while</code> loops but not
yet <code>for</code> loops. And I'm not sure if it supports <code>else
if</code>. It does not support arrays or objects let alone
constructors and prototypes. Adding support for these is low-hanging
fruit.</p>
<p>After the low-hanging fruit, more interesting projects for jsc include:</p>
<ul>
<li>generating C++ with embedded V8 rather than only targeting Node addons</li>
<li>type inference or type hinting for generating unboxed functions a la Cython and SBCL</li>
</ul>
http://notes.eatonphil.com/compiling-dynamic-programming-languages.htmlSun, 02 Sep 2018 00:00:00 +0000
- btest: a language agnostic test runnerhttp://notes.eatonphil.com/btest-a-language-agnostic-test-runner.html<p><a href="https://github.com/briansteffens/btest">btest</a> is a minimal,
language-agnostic test runner originally written for testing
compilers. Brian, an ex- co-worker from Linode, wrote the first
implementation in <a href="https://crystal-lang.org/">Crystal</a> (a compiled
language clone of Ruby) for testing
<a href="https://github.com/briansteffens/bshift">bshift</a>, a compiler
project. The tool accomplished exactly what I needed for my own
language project, <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a>,
and had very few dependencies. After some issues with Crystal support
in containerized CI environments, and despite some incredible
<a href="https://github.com/briansteffens/btest/pull/5">assistance from</a> <a href="https://github.com/briansteffens/btest/pull/4">the
Crystal community</a>, we
rewrote btest in D to simplify downstream use.</p>
<h3 id="how-it-works">How it works</h3><p>btest registers a command (or commands) to run and verifies the
command output and status for different inputs. btest iterates over
files in a directory to discover test groups and individual tests
within. It supports a limited template language for easily adjusting a
more-or-less similar set of tests. And it supports running test groups
and individual tests themselves in parallel. All of this is managed
via a simple YAML config.</p>
<h3 id="btest.yaml">btest.yaml</h3><p>btest requires a project-level configuration file to declare the test
directory, the command(s) to run per test, etc. Let's say we want to
run tests against a python program. We create
a <code>btest.yaml</code> file with the following:</p>
<div class="highlight"><pre><span></span><span class="nt">test_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">tests</span>
<span class="nt">runners</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Run tests with cpython</span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python test.py</span>
</pre></div>
<p><code>test_path</code> is the directory in which tests are located.
<code>runners</code> is an array of commands to run per test. We
hard-code a file to run <code>test.py</code> as a project-level
standard file that will get written to disk in an appropriate path for
each test-case.</p>
<h4 id="on-multiple-runners">On multiple runners</h4><p>Using multiple runners is helpful when we want to run all tests with
different test commands or test command settings. For instance, we
could run tests against cpython and pypy by adding another runner to
the runners section.</p>
<div class="highlight"><pre><span></span><span class="nt">test_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">tests</span>
<span class="nt">runners</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Run tests with cpython</span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python test.py</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Run tests with pypy</span>
<span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pypy test.py</span>
</pre></div>
<h3 id="an-example-test-config">An example test config</h3><p>Let's create a <code>divide-by-zero.yaml</code> file in
the <code>tests</code> directory and add the following:</p>
<div class="highlight"><pre><span></span><span class="nt">cases</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Should exit on divide by zero</span>
<span class="w"> </span><span class="nt">status</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span>
<span class="w"> </span><span class="nt">stdout</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">Traceback (most recent call last):</span>
<span class="w"> </span><span class="no">File "test.py", line 1, in <module></span>
<span class="w"> </span><span class="no">4 / 0</span>
<span class="w"> </span><span class="no">ZeroDivisionError: division by zero</span>
<span class="w"> </span><span class="nt">denominator</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0</span>
<span class="nt">templates</span><span class="p">:</span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">test.py</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span>
<span class="w"> </span><span class="no">4 / {{ denominator }}</span>
</pre></div>
<p>In this example, <code>name</code> will be printed out when the test
is run. <code>status</code> is the expected integer returned by
running the program. <code>stdout</code> is the entire expected output
written by the program during execution. None of these three fields
are required. If <code>status</code> or <case>stdout</case> are not
provided, btest will skip checking them.</p>
<p>Any additional key-value pairs are treated as template variable values
and will be substituted if/where it is referenced in the templates
section when the case is run. <code>denominator</code> is the only
such variable we use in this example. When this first (and only) case
is run, <code>test.py</code> will be written to disk
containing <code>4 / 0</code>.</p>
<h4 id="templates-section">templates section</h4><p>The <code>templates</code> section is a dictionary allowing us to
specify files to be created with variable substitution. All files are
created in the same directory per test case, so if we want to import
code we can do so with relative paths.</p>
<p><a href="https://github.com/eatonphil/bsdscheme/blob/master/tests/include.yaml">Here</a>
is a simple example of a BSDScheme test that uses this feature.</p>
<h3 id="running-btest">Running btest</h3><p>Run btest from the root directory (the directory
above <code>tests</code>) and we'll see all the grouped test cases
that btest registers and the result of each test:</p>
<div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">btest</span>
<span class="n">tests</span><span class="o">/</span><span class="n">divide</span><span class="o">-</span><span class="k">by</span><span class="o">-</span><span class="n">zero</span><span class="p">.</span><span class="n">yaml</span>
<span class="o">[</span><span class="n">PASS</span><span class="o">]</span><span class="w"> </span><span class="n">Should</span><span class="w"> </span><span class="k">exit</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">divide</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">zero</span>
<span class="mi">1</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">tests</span><span class="w"> </span><span class="n">passed</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nl">runner</span><span class="p">:</span><span class="w"> </span><span class="n">Run</span><span class="w"> </span><span class="n">tests</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="n">cpython</span>
</pre></div>
<h3 id="use-in-ci-environments">Use in CI environments</h3><p>In the future we may provide pre-built release binaries. But in the
meantime, the CI step involves downloading git and ldc and
building/installing btest before calling it.</p>
<h4 id="circle-ci">Circle CI</h4><p>This is the
<a href="https://github.com/eatonphil/bsdscheme/blob/master/.circleci/config.yml">config</a>
file I use for testing BSDScheme:</p>
<div class="highlight"><pre><span></span><span class="n">version</span><span class="o">:</span><span class="w"> </span><span class="mi">2</span>
<span class="n">jobs</span><span class="o">:</span>
<span class="w"> </span><span class="n">build</span><span class="o">:</span>
<span class="w"> </span><span class="n">docker</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">image</span><span class="o">:</span><span class="w"> </span><span class="n">dlanguage</span><span class="o">/</span><span class="n">ldc</span>
<span class="w"> </span><span class="n">steps</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">checkout</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Install</span><span class="w"> </span><span class="n">debian</span><span class="o">-</span><span class="n">packaged</span><span class="w"> </span><span class="n">dependencies</span>
<span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="n">apt</span><span class="w"> </span><span class="n">update</span>
<span class="w"> </span><span class="n">apt</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">-</span><span class="n">y</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">build</span><span class="o">-</span><span class="n">essential</span>
<span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$</span><span class="o">(</span><span class="n">which</span><span class="w"> </span><span class="n">ldc2</span><span class="o">)</span><span class="w"> </span><span class="sr">/usr/local/bin/</span><span class="n">ldc</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Install</span><span class="w"> </span><span class="n">btest</span>
<span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="o">.</span><span class="na">com</span><span class="sr">/briansteffens/</span><span class="n">btest</span>
<span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">btest</span>
<span class="w"> </span><span class="n">make</span>
<span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Install</span><span class="w"> </span><span class="n">bsdscheme</span>
<span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="o">|</span>
<span class="w"> </span><span class="n">make</span>
<span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span>
<span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Run</span><span class="w"> </span><span class="n">bsdscheme</span><span class="w"> </span><span class="n">tests</span>
<span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="n">btest</span>
</pre></div>
<h4 id="travis-ci">Travis CI</h4><p>This is the
<a href="https://github.com/briansteffens/bshift/blob/master/.travis.yml">config</a>
Brian uses for testing BShift:</p>
<div class="highlight"><pre><span></span><span class="n">sudo</span><span class="o">:</span><span class="w"> </span><span class="n">required</span>
<span class="n">language</span><span class="o">:</span><span class="w"> </span><span class="n">d</span>
<span class="n">d</span><span class="o">:</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">ldc</span>
<span class="n">script</span><span class="o">:</span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">ldc</span><span class="w"> </span><span class="n">gets</span><span class="w"> </span><span class="n">installed</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">other</span><span class="w"> </span><span class="n">names</span><span class="w"> </span><span class="n">sometimes</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="err">`</span><span class="n">which</span><span class="w"> </span><span class="n">$DC</span><span class="err">`</span><span class="w"> </span><span class="sr">/usr/local/bin/</span><span class="n">ldc</span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">bshift</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">make</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$PWD</span><span class="sr">/bin/bshift /usr/local/bin/</span><span class="n">bshift</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$PWD</span><span class="sr">/lib /usr/local/lib/</span><span class="n">bshift</span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">nasm</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="kd">get</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">-</span><span class="n">y</span><span class="w"> </span><span class="n">nasm</span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">basm</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="o">.</span><span class="na">com</span><span class="sr">/briansteffens/</span><span class="n">basm</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">basm</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">cabal</span><span class="w"> </span><span class="n">build</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="o">..</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$PWD</span><span class="sr">/basm/dist/build/basm/basm /usr/local/bin/</span><span class="n">basm</span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">btest</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="o">.</span><span class="na">com</span><span class="sr">/briansteffens/</span><span class="n">btest</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">btest</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="o">..</span>
<span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">tests</span>
<span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">btest</span>
</pre></div>
http://notes.eatonphil.com/btest-a-language-agnostic-test-runner.htmlSat, 04 Aug 2018 00:00:00 +0000
- Writing to be readhttp://notes.eatonphil.com/writing-to-be-read.html<p>There is a common struggle in the writing and maintenance of
documentation, checklists, emails, guides, etc. Each provides immense
value; a document may be the key to an important process. The goal is
to remove barriers -- to encourage understanding and correct
application of what has been noted -- without requiring a change in
the character of the reader. That is, expect reading to be difficult
and people to be lazy. <strong>Don't make things harder for your reader than
need be.</strong></p>
<p>Ignoring imperfections in the <em>ideas</em> transcribed into writing, there
are a few particular aesthetic approaches I take to (hopefully) make
my notes more effective. These ideas have been influenced by readings
on writing, psychology, and user experience. In particular, I
recommend
<a href="https://amzn.to/2rT0dsE">On Writing Well</a>,
<a href="https://amzn.to/2IttNAl">Thinking Fast and Slow</a>,
and <a href="https://www.nngroup.com/">Nielsen Norman</a> research.</p>
<h3 id="language-correctness">Language correctness</h3><p>Spelling and grammatical correctness are low hanging fruit. They are
easy to achieve. Use full sentences, use punctuation, and capitalize
appropriately. But don't be a grammar stickler unreasonably; language
is flexible and always changing. Don't allow anyone the opportunity to
take your work less seriously by screwing up the basics.</p>
<h3 id="structuring-sentences-and-paragraphs">Structuring sentences and paragraphs</h3><p>Keep your sentences short. And avoid run on sentences; they are always
difficult to parse. If you use more than two commas in a sentences
(aside from in lists), the sentence is terrible. Split it up. Commas
are often used superfluously. Don't do that.</p>
<p>Remember that if a comma separates two sentences, you can separate
them into two sentences with a period instead. And if you ever have a
list containing another list, separate the outer list with semi colons
instead of commas to provide better differentiation.</p>
<p>Keep your paragraphs short too. In primary school you may have learned
to use 5-8 sentences per paragraph. Don't do so needlessly. 3-5
sentences can be perfectly appropriate. As both sentences and
paragraphs get longer, they appear more intimidating and can
discourage readers from continuing.</p>
<div class="note">
<header class="note-header">Visually speaking</header>
<p>
Make your line height
<a href="https://practicaltypography.com/line-spacing.html">120-145%</a>
the height of the font. Increase the spacing between lines in a
paragraph to make the paragraph less dense and more friendly.
</p>
<p>
Keep contrast high. Don't put very gray (or colored) text on a
white background.
</p>
<p>
Additionally, a number of studies suggest that limiting the width
of text increases readability. For best results, limit the width
such
that <a href="https://baymard.com/blog/line-length-readability">50-75
characters</a> appear per line of text.
</p>
</div><h4 id="don't-put-checklists-in-paragraphs">Don't put checklists in paragraphs</h4><p>If a document describes concrete steps that should be followed exactly
and can be reasonably summarized, don't hide the steps within
paragraphs of text. Instead use an ordered or unordered list to
clearly enumerate the expectations. <strong>You can't expect a checklist to
be followed when it is hidden within the sentences of a paragraph.</strong></p>
<h3 id="structuring-sections">Structuring sections</h3><p>Any document (regardless the type) longer than 3-5 paragraphs should
be broken into sub-sections with summarizing headers to aid
scanning. Use the HTML <code>id</code> attribute to allow a direct link to a
particular section in a long page. If the page has more than two
sections or vertically flows beyond a single screen, consider adding a
table of contents at the top of the page to allow the reader to find
the exact section she needs.</p>
<div class="note">
<header class="note-header">Visually speaking</header>
<p>
Don't put large headers immediately next to each other. It is
disruptive to have multiple lines of large text.
</p>
<p>
I almost completely avoid Github Markdown's h1/# tag because it is
just too large and jarring relative to the rest of the text. It is
often best for the flow of a Github Markdown document to stick to
only h3-h4/###-#### tags for headers, using the h2/## tag for the
document title.
</p>
</div><h3 id="in-summary">In summary</h3><p>The aesthetic flow of a document can help or hurt the experience of a
reader consuming it. Good aesthetic "sense" in this regard can be
boiled down to a few methods that primarily revolve around simplifying
structure and facilitating the rewarding feeling of progress as a
reader reads.</p>
<p>Writing is difficult and takes time to evolve helpfully. The dividends
are paid when process is better followed and questions are readily
clarified in writing without further human intervention. It is
incumbent on those writing and maintaining to organize effectively and
see confusion of the reader as fault of the document, not fault of the
reader. It is easier to change something yourself than to expect
others to change to accommodate you.</p>
http://notes.eatonphil.com/writing-to-be-read.htmlFri, 18 May 2018 00:00:00 +0000
- Writing a simple JSON parserhttp://notes.eatonphil.com/writing-a-simple-json-parser.html<p>Writing a JSON parser is one of the easiest ways to get familiar with
parsing techniques. The format is extremely simple. It's defined
recursively so you get a slight challenge compared to, say, parsing
<a href="https://en.wikipedia.org/wiki/Brainfuck">Brainfuck</a>; and you probably
already use JSON. Aside from that last point, parsing
<a href="https://en.wikipedia.org/wiki/S-expression">S-expressions</a> for Scheme
might be an even simpler task.</p>
<p>If you'd just like to see the code for the library, <code>pj</code>, <a href="https://github.com/eatonphil/pj">check it out
on Github</a>.</p>
<h3 id="what-parsing-is-and-(typically)-is-not">What parsing is and (typically) is not</h3><p>Parsing is often broken up into two stages: lexical analysis and
syntactic analysis. Lexical analysis breaks source input into the
simplest decomposable elements of a language called "tokens".
Syntactic analysis (often itself called "parsing") receives the list
of tokens and tries to find patterns in them to meet the language
being parsed.</p>
<p>Parsing does not determine semantic viability of an input
source. Semantic viability of an input source might include whether or
not a variable is defined before being used, whether a function is
called with the correct arguments, or whether a variable can be
declared a second time in some scope.</p>
<p class="note">
There are, of course, always variations in how people choose to
parse and apply semantic rules, but I am assuming a "traditional"
approach to explain the core concepts.
</p><h4 id="the-json-library's-interface">The JSON library's interface</h4><p>Ultimately, there should be a <code>from_string</code> method that accepts a
JSON-encoded string and returns the equivalent Python dictionary.</p>
<p>For example:</p>
<div class="highlight"><pre><span></span>assert_equal(from_string('{"foo": 1}'),
{"foo": 1})
</pre></div>
<h3 id="lexical-analysis">Lexical analysis</h3><p>Lexical analysis breaks down an input string into tokens. Comments and
whitespace are often discarded during lexical analysis so you are left
with a simpler input you can search for grammatical matches during the
syntactic analysis.</p>
<p>Assuming a simple lexical analyzer, you might iterate over all the
characters in an input string (or stream) and break them apart into
fundemental, <strong>non-recursively</strong> defined language constructs such as
integers, strings, and boolean literals. In particular, strings
<strong>must</strong> be part of the lexical analysis because you cannot throw away
whitespace without knowing that it is not part of a string.</p>
<p class="note">
In a helpful lexer you keep track of the whitespace and comments
you've skipped, the current line number and file you are in so that
you can refer back to it at any stage in errors produced by analysis
of the source. <a
href="https://v8project.blogspot.com/2018/03/v8-release-66.html">The
V8 Javascript engine recently became able to do reproduce the exact
source code of a function.</a> This, at the very least, would need
the help of a lexer to make possible.
</p><h4 id="implementing-a-json-lexer">Implementing a JSON lexer</h4><p>The gist of the JSON lexer will be to iterate over the input source
and try to find patterns of strings, numbers, booleans, nulls, or JSON
syntax like left brackets and left braces, ultimately returning
each of these elements as a list.</p>
<p>Here is what the lexer should return for an example input:</p>
<div class="highlight"><pre><span></span><span class="n">assert_equal</span><span class="p">(</span><span class="n">lex</span><span class="p">(</span><span class="s1">'{"foo": [1, 2, {"bar": 2}]}'</span><span class="p">),</span>
<span class="p">[</span><span class="s1">'{'</span><span class="p">,</span> <span class="s1">'foo'</span><span class="p">,</span> <span class="s1">':'</span><span class="p">,</span> <span class="s1">'['</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="s1">'{'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">,</span> <span class="s1">':'</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'}'</span><span class="p">,</span> <span class="s1">']'</span><span class="p">,</span> <span class="s1">'}'</span><span class="p">])</span>
</pre></div>
<p>Here is what this logic might begin to look like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">json_string</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">json_string</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span>
<span class="k">continue</span>
<span class="c1"># TODO: lex booleans, nulls, numbers</span>
<span class="k">if</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_WHITESPACE</span><span class="p">:</span>
<span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">elif</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_SYNTAX</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Unexpected character: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="k">return</span> <span class="n">tokens</span>
</pre></div>
<p>The goal here is to try to match strings, numbers, booleans, and nulls
and add them to the list of tokens. If none of these match, check if
the character is whitespace and throw it away if so. Otherwise store
it as a token if it is part of JSON syntax (like left
brackets). Finally throw an exception if the character/string didn't
match any of these patterns.</p>
<p>Let's extend the core logic here a little bit to support all the types
and add the function stubs.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">def</span> <span class="nf">lex_number</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">def</span> <span class="nf">lex_bool</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">def</span> <span class="nf">lex_null</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">json_string</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">json_string</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span>
<span class="k">continue</span>
<span class="n">json_number</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_number</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">json_number</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_number</span><span class="p">)</span>
<span class="k">continue</span>
<span class="n">json_bool</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_bool</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">json_bool</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_bool</span><span class="p">)</span>
<span class="k">continue</span>
<span class="n">json_null</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_null</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">json_null</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span>
<span class="k">continue</span>
<span class="k">if</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_WHITESPACE</span><span class="p">:</span>
<span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">elif</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_SYNTAX</span><span class="p">:</span>
<span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Unexpected character: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span>
<span class="k">return</span> <span class="n">tokens</span>
</pre></div>
<h4 id="lexing-strings">Lexing strings</h4><p>For the <code>lex_string</code> function, the gist will be to check if the first
character is a quote. If it is, iterate over the input string until
you find an ending quote. If you don't find an initial quote, return
None and the original list. If you find an initial quote and an ending
quote, return the string within the quotes and the rest of the
unchecked input string.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">json_string</span> <span class="o">=</span> <span class="s1">''</span>
<span class="k">if</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">JSON_QUOTE</span><span class="p">:</span>
<span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">string</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="n">JSON_QUOTE</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json_string</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">json_string</span> <span class="o">+=</span> <span class="n">c</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected end-of-string quote'</span><span class="p">)</span>
</pre></div>
<h4 id="lexing-numbers">Lexing numbers</h4><p>For the <code>lex_number</code> function, the gist will be to iterate over the
input until you find a character that cannot be part of a number.
(This is, of course, a gross simplification, but being more accurate
will be left as an exercise to the reader.) After finding a character
that cannot be part of a number, either return a float or int if the
characters you've accumulated number more than 0. Otherwise return
None and the original string input.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_number</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">json_number</span> <span class="o">=</span> <span class="s1">''</span>
<span class="n">number_characters</span> <span class="o">=</span> <span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">d</span><span class="p">)</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">)]</span> <span class="o">+</span> <span class="p">[</span><span class="s1">'-'</span><span class="p">,</span> <span class="s1">'e'</span><span class="p">,</span> <span class="s1">'.'</span><span class="p">]</span>
<span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">string</span><span class="p">:</span>
<span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">number_characters</span><span class="p">:</span>
<span class="n">json_number</span> <span class="o">+=</span> <span class="n">c</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">break</span>
<span class="n">rest</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">json_number</span><span class="p">):]</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">json_number</span><span class="p">):</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">if</span> <span class="s1">'.'</span> <span class="ow">in</span> <span class="n">json_number</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">json_number</span><span class="p">),</span> <span class="n">rest</span>
<span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">json_number</span><span class="p">),</span> <span class="n">rest</span>
</pre></div>
<h4 id="lexing-booleans-and-nulls">Lexing booleans and nulls</h4><p>Finding boolean and null values is a very simple string match.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_bool</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">string_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">string_len</span> <span class="o">>=</span> <span class="n">TRUE_LEN</span> <span class="ow">and</span> \
<span class="n">string</span><span class="p">[:</span><span class="n">TRUE_LEN</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'true'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="n">TRUE_LEN</span><span class="p">:]</span>
<span class="k">elif</span> <span class="n">string_len</span> <span class="o">>=</span> <span class="n">FALSE_LEN</span> <span class="ow">and</span> \
<span class="n">string</span><span class="p">[:</span><span class="n">FALSE_LEN</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'false'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">False</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="n">FALSE_LEN</span><span class="p">:]</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
<span class="k">def</span> <span class="nf">lex_null</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">string_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">if</span> <span class="n">string_len</span> <span class="o">>=</span> <span class="n">NULL_LEN</span> <span class="ow">and</span> \
<span class="n">string</span><span class="p">[:</span><span class="n">NULL_LEN</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'null'</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="n">NULL_LEN</span><span class="p">:]</span>
<span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span>
</pre></div>
<p>And now the lexer code is done! See the
<a href="https://github.com/eatonphil/pj/blob/master/pj/lexer.py">pj/lexer.py</a>
for the code as a whole.</p>
<h3 id="syntactic-analysis">Syntactic analysis</h3><p>The syntax analyzer's (basic) job is to iterate over a one-dimensional
list of tokens and match groups of tokens up to pieces of the language
according to the definition of the language. If, at any point during
syntactic analysis, the parser cannot match the current set of tokens up
to a valid grammar of the language, the parser will fail and possibly
give you useful information as to what you gave, where, and what it
expected from you.</p>
<h4 id="implementing-a-json-parser">Implementing a JSON parser</h4><p>The gist of the JSON parser will be to iterate over the tokens
received after a call to <code>lex</code> and try to match the tokens to objects,
lists, or plain values.</p>
<p>Here is what the parser should return for an example input:</p>
<div class="highlight"><pre><span></span><span class="n">tokens</span> <span class="o">=</span> <span class="n">lex</span><span class="p">(</span><span class="s1">'{"foo": [1, 2, {"bar": 2}]}'</span><span class="p">)</span>
<span class="n">assert_equal</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span>
<span class="p">[</span><span class="s1">'{'</span><span class="p">,</span> <span class="s1">'foo'</span><span class="p">,</span> <span class="s1">':'</span><span class="p">,</span> <span class="s1">'['</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">','</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'{'</span><span class="p">,</span> <span class="s1">'bar'</span><span class="p">,</span> <span class="s1">':'</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">'}'</span><span class="p">,</span> <span class="s1">']'</span><span class="p">,</span> <span class="s1">'}'</span><span class="p">])</span>
<span class="n">assert_equal</span><span class="p">(</span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">),</span>
<span class="p">{</span><span class="s1">'foo'</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">{</span><span class="s1">'bar'</span><span class="p">:</span> <span class="mi">2</span><span class="p">}]})</span>
</pre></div>
<p>Here is what this logic might begin to look like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[],</span> <span class="n">tokens</span>
<span class="k">def</span> <span class="nf">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="k">return</span> <span class="p">{},</span> <span class="n">tokens</span>
<span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_LEFTBRACKET</span><span class="p">:</span>
<span class="k">return</span> <span class="n">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="k">elif</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_LEFTBRACE</span><span class="p">:</span>
<span class="k">return</span> <span class="n">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">t</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
</pre></div>
<p>A key structural difference between this lexer and parser is that the
lexer returns a one-dimensional array of tokens. Parsers are often
defined recursively and returns a recursive, tree-like object. Since
JSON is a data serialization format instead of a language, the parser
should produce objects in Python rather than a syntax tree on which
you could perform more analysis (or code generation in the case of a
compiler).</p>
<p>And, again, the benefit of having the lexical analysis happen
independent from the parser is that both pieces of code are simpler
and concerned with only specific elements.</p>
<h4 id="parsing-arrays">Parsing arrays</h4><p>Parsing arrays is a matter of parsing array members and expecting a
comma token between them or a right bracket indicating the end
of the array.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">json_array</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACKET</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json_array</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">json</span><span class="p">,</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="n">json_array</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json</span><span class="p">)</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACKET</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json_array</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">elif</span> <span class="n">t</span> <span class="o">!=</span> <span class="n">JSON_COMMA</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected comma after object in array'</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected end-of-array bracket'</span><span class="p">)</span>
</pre></div>
<h4 id="parsing-objects">Parsing objects</h4><p>Parsing objects is a matter of parsing a key-value pair internally
separated by a colon and externally separated by a comma until you
reach the end of the object.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="n">json_object</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACE</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json_object</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
<span class="n">json_key</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">json_key</span><span class="p">)</span> <span class="ow">is</span> <span class="nb">str</span><span class="p">:</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected string key, got: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">json_key</span><span class="p">))</span>
<span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="n">JSON_COLON</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected colon after key in object, got: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="n">json_value</span><span class="p">,</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span>
<span class="n">json_object</span><span class="p">[</span><span class="n">json_key</span><span class="p">]</span> <span class="o">=</span> <span class="n">json_value</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACE</span><span class="p">:</span>
<span class="k">return</span> <span class="n">json_object</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">elif</span> <span class="n">t</span> <span class="o">!=</span> <span class="n">JSON_COMMA</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected comma after pair in object, got: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Expected end-of-object brace'</span><span class="p">)</span>
</pre></div>
<p>And now the parser code is done! See the
<a href="https://github.com/eatonphil/pj/blob/master/pj/parser.py">pj/parser.py</a>
for the code as a whole.</p>
<h3 id="unifying-the-library">Unifying the library</h3><p>To provide the ideal interface, create the <code>from_string</code> function
wrapping the <code>lex</code> and <code>parse</code> functions.</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">from_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">lex</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
<span class="k">return</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
</pre></div>
<p>And the library is complete! (ish). Check out the <a href="https://github.com/eatonphil/pj">project on
Github</a> for the full implementation
including basic testing setup.</p>
<h3 id="appendix-a:-single-step-parsing">Appendix A: Single-step parsing</h3><p>Some parsers choose to implement lexical and syntactic analysis in one
stage. For some languages this can simplify the parsing stage
entirely. Or, in more powerful languages like Common Lisp, it can
allow you to dynamically extend the lexer and parser in one step with
<a href="https://gist.github.com/chaitanyagupta/9324402">reader macros</a>.</p>
<p class="note">
I wrote this library in Python to make it more accessible to
a larger audience. However, many of the techniques used are more
amenable to languages with pattern matching and support for monadic
operations -- like Standard ML. If you are curious what this same
code would look like in Standard ML, check out the <a
href="https://github.com/eatonphil/ponyo/blob/master/src/Encoding/Json.sml">JSON
code in Ponyo</a>.
</p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short post (and a corresponding Python library) explaining lexing and parsing with JSON <a href="https://t.co/3yEZlcU6i5">https://t.co/3yEZlcU6i5</a> <a href="https://t.co/FbksvUO9aT">https://t.co/FbksvUO9aT</a> <a href="https://twitter.com/hashtag/json?src=hash&ref_src=twsrc%5Etfw">#json</a> <a href="https://twitter.com/hashtag/python?src=hash&ref_src=twsrc%5Etfw">#python</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/993251098931712005?ref_src=twsrc%5Etfw">May 6, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/writing-a-simple-json-parser.htmlSun, 06 May 2018 00:00:00 +0000
- Finishing up a FreeBSD experimenthttp://notes.eatonphil.com/finishing-up-a-freebsd-experiment.html<p>I've been using FreeBSD as my daily driver at work since
December. I've successfully done my job and I've learned a hell of a
lot forcing myself on CURRENT... But there's been a number of issues
with it that have made it difficult to keep using, so I replaced it
with Arch Linux yesterday and I no longer have those issues. This is
not the first time I've forced myself to run FreeBSD and it won't be
the last.</p>
<h3 id="the-freebsd-setup">The FreeBSD setup</h3><p>I have a Dell Developer Edition. It employs full-disk encryption with
ZFS. Not being a "disk-jockey" I cannot comment on how exhilarating an
experience running ZFS is. It didn't cause me any trouble.</p>
<p>It has an Intel graphics card and the display server is X. I use the
<a href="https://stumpwm.github.io">StumpWM</a> window manager and
the <a href="https://github.com/iwamatsu/slim">SLiM</a> login
manager. <a href="https://www.jwz.org/xscreensaver/">xscreensaver</a> handles
locking the screen, <a href="https://feh.finalrewind.org/">feh</a> gives me
background images, <a href="https://github.com/dreamer/scrot">scrot</a> gives me
screenshots, and
<a href="http://recordmydesktop.sourceforge.net/about.php">recordMyDesktop</a>
gives me video screen capture. This list should feel familiar to users
of Arch Linux or other X-supported, bring-your-own-software operating
systems/Linux distributions.</p>
<h4 id="software-development">Software development</h4><p>I primarily work on a web application with Node/PostgreSQL and React/SASS.
I do all of this development locally on FreeBSD. I run other components of
our system in a Vagrant-managed VirtualBox virtual machine.</p>
<h4 id="upgrading-the-system">Upgrading the system</h4><p>Since I'm running CURRENT, I fetch the latest commit on Subversion and
rebuild the FreeBSD system (kernel + user-land) each weekend to get
the new hotness. This takes somewhere between 1-4 hours. I start the
process Sunday morning and come back to it after lunch. After the
system is compiled and installed, I update all the packages through
the package manager and deal with fallout from incompatible kernel
modules that send me in a crash/reboot loop on boot.</p>
<p>This is actually the part about running FreeBSD (CURRENT) I love the
most. I've gotten more familiar with the development and distribution
of kernel modules like the WiFi, Graphics, and VirtualBox
drivers. I've learned a lot about the organization of the FreeBSD
source code. And I've gotten some improvements
<a href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226015">merged</a>
into the FreeBSD Handbook on how to debug a core dump.</p>
<h3 id="issues-with-freebsd-on-my-hardware">Issues with FreeBSD on my hardware</h3><p>I installed CURRENT in December to get support for new Intel graphics
drivers (which have since been backported to STABLE). The built-in
Intel WiFi card is also new enough that it hadn't been backported to
STABLE. My WiFi ultimately never got more than 2-4Mbps down on the
same networks my Macbook Pro would get 120-250Mbps down. I even bought
an older Realtek USB WiFi adapter and it fared no differently. My
understanding is that this is because CURRENT turns on enough debug
flags that the entire system is not really meant to be used except for
by FreeBSD developers.</p>
<p>It would often end up taking 10-30 seconds for a <code>git push</code> to
happen. It would take minutes to pull new Docker images, etc. This
(like everything else) does not mean you cannot do work on FreeBSD
CURRENT, it makes it really annoying.</p>
<h4 id="appendix-a---headphones">Appendix A - Headphones</h4><p>I couldn't figure out the headphone jack at all. Configuring outputs
via <code>sysctl</code> and <code>device.hints</code> is either really complicated or
presented in documentation really complicatedly. I posted a few times
in #freebsd on Freenode and got eager assistance but ultimately
couldn't get the headphone jack to produce anything without incredible
distortion.</p>
<p>Of course Spotify has no FreeBSD client and I didn't want to try the
Linux compatiblity layer (which may have worked). I tried spoofing
user agents for the Spotify web app in Chrome but couldn't find one
that worked. (I still cannot get a working one on Linux either.) So
I'd end up listening to Spotify on my phone, which would have been
acceptable except for that the studio headphones I decided I needed
were immensely under-powered by my phone.</p>
<h4 id="appendix-b---yubikey">Appendix B - Yubikey</h4><p>I couldn't figure out how to give myself non-root access to my Yubikey
which I <em>believe</em> is the reason I ultimately wasn't able to make any
use of it. Though admittedly I don't understand a whit of GPG/PGP or
Yubikey itself.</p>
<h4 id="appendix-c---bhyve">Appendix C - bhyve</h4><p>I really wanted to use
<a href="https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html">bhyve</a>
as the hypervisor for my CentOS virtual machines instead of
VirtualBox. So I spent 2-3 weekends trying to get it working as a
backend for Vagrant. Unfortunately the best "supported" way of doing
this is to manually mutate VirtualBox-based Vagrant boxes and that
just repeatedly didn't work for me.</p>
<p>When I tried using bhyve directly I couldn't get networking
right. Presumably this is because NAT doesn't work well with wireless
interfaces... And I hadn't put in enough weekends to understand
setting up proxy rules correctly.</p>
<h4 id="appendix-d---synaptics">Appendix D - Synaptics</h4><p>It is my understanding that FreeBSD has its own custom Synaptics
drivers and configuration interfaces. Whether that is the case or not,
the documentation is a nightmare and while I would have loved to punt
to a graphical interface to prevent from fat-palming the touchpad
every 30 seconds, none of the graphical configuration tools seemed to
work.</p>
<p>A few weeks ago I think I finally got the synaptics support <em>on</em> but I
couldn't scroll or select text anymore. I also had to disable
synaptics, restart X, enable synaptics, and restart X on each boot for
it to successfully register the mouse. I meant to post in #freebsd on
Freenode where I probably would have found a solution but :shrugs:.</p>
<h4 id="appendix-e---sleep">Appendix E - Sleep</h4><p>Well sleep doesn't really work on any modern operating system.</p>
<h3 id="freebsd-is-awesome">FreeBSD is awesome</h3><p>I enjoy picking on my setup, but it should be impressive that you can
do real-world work on FreeBSD. If I had a 3-4 year old laptop instead
of a 1-2 year old laptop, most of my issues would be solved.</p>
<p>Here are some reasons to like FreeBSD.</p>
<h4 id="less-competition">Less competition</h4><p>This is kind of stupid. But it's easier to find work to do (e.g. docs
to fix, bugs to report, ports to add/update, drivers to test) on
FreeBSD. I'm really disappointed to be back on Linux because I like
being closer to the community and knowing there are ways I can
contribute and learn. It's difficult to find the right combination of
fending/learning for yourself and achieving a certain level of
productivity.</p>
<h4 id="package-management-(culture)">Package management (culture)</h4><p>Rolling packages are really important to me as a developer. When I've
run Ubuntu and Debian desktops in the past, I typically built 5-15
major (to my workflow) components from source myself. This is
annoying. Rolling package systems are both easier to use and easier to
contribute to... The latter point may be a coincidence.</p>
<p>In FreeBSD, packages are rolling and the base system (kernel +
userland) is released every year or two if you run the
recommended/supported "flavors" of FreeBSD (i.e. not CURRENT). If
you're running CURRENT then everything is rolling.</p>
<p>Packages are binary, but you can build them from source if needed.</p>
<h4 id="source">Source</h4><p>FreeBSD has an older code base than Linux does but still manages to be
much better organized. OpenBSD and Minix are even better organized but
I don't consider them in the grouping as mainstream general-purpose
operating systems like FreeBSD and Linux. Linux is an awful mess
and is very intimidating, though I hope to get over that.</p>
<h4 id="old-school-interfaces">Old-school interfaces</h4><p>There's no systemd so starting X is as simple as <code>startx</code> (but you can
enable the login manager service to have it launch on boot). You
configure your network interfaces via <code>ifconfig</code>, <code>wpa_supplicant</code>,
and <code>dhclient</code>.</p>
<h4 id="alternatives">Alternatives</h4><p><a href="https://www.trueos.org/">PCBSD or TrueOS</a> may be a good option for
desktop users but something about the project turns me off (maybe it's
the scroll-jacking website).</p>
<h3 id="picking-arch-linux">Picking Arch Linux</h3><p>In any case, I decided it was time to stop waiting for <code>git push</code> to
finish. I had run Gentoo at work for 3-4 months before I installed
FreeBSD. But I still had nightmares of resolving dependencies during
upgrades. I needed a binary package manager (not hard to find) and a
rolling release system.</p>
<h4 id="installing-arch-stinks">Installing Arch stinks</h4><p>Many of my old coworkers at Linode run Arch Linux at home so I've
looked into it a few times. It absolutely meets my rolling release and
binary packaging needs. But I've been through the installation once
before (and I've been through Gentoo's) and loathed the minutes-long
effort required to set up full-disk encryption. Also, systemd? :(</p>
<h4 id="how-about-void-linux?">How about Void Linux?</h4><p>Void Linux looked promising and avoids systemd (which legitimately
adds complexity and new tools to learn for desktop users with graphics
and WiFi/DHCP networking). It has a rolling release system and binary
packages, but overall didn't seem popular enough. I worried I'd be in
the same boat as in Debian/Ubuntu building lots of packages myself.</p>
<h4 id="what-about-arch-based-distros?">What about Arch-based distros?</h4><p>Eventually I realized <a href="http://antergos.com/">Antergos</a> and
<a href="https://manjaro.org/">Manjaro</a> are two (Distrowatch-rated) popular
distributions that are based on Arch and would provide me with the
installer I really wanted. I read more about Manjaro and found it was
pretty divergent from Arch. That didn't sound appealing. Divergent
distributions like Manjaro and Mint exist to cause trouble. Antergos,
on the other hand, seemed to be a thin layer around Arch including a
graphical installer and its own few package repositories. It seemed
easy enough to remove after the installation was finished.</p>
<h3 id="antergos-linux">Antergos Linux</h3><p>I ran the Antergos installer and the first time around, my touchpad
didn't work at all. I tried a USB mouse (that to be honest, may have
been broken anyway) but it didn't seem to be recognized. I rebooted
and my touchpad worked.</p>
<p>I tried to configure WiFi using the graphical NetworkManager provided
but it was super buggy. Menus kept expanding and contracting as I
moused over items. And it ultimately never prompted me for a password
to the locked networks around me. (It showed lock icons beside the
locked networks.)</p>
<p>I spent half an hour trying to configure the WiFi manually. After I
got it working (and "learned" all the fun new modern tools like <code>ip</code>,
<code>iw</code>, <code>dhcpcd</code>, <code>iwconfig</code>, and systemd networking), the Antergos
installer would crash at the last step for some error related to not
being able to update itself.</p>
<p>At this point I gave up. The Antergos installer was half-baked, buggy,
and was getting me nowhere.</p>
<h3 id="anarchy-linux">Anarchy Linux</h3><p>Still loathe to spend a few minutes configuring disk encryption
manually, I interneted until I found <a href="https://anarchy-linux.org/">Anarchy
Linux</a> (which used to be Arch Anywhere).</p>
<p>This installer seemed even more promising. It is a TUI installer so no
need for a mouse and there are more desktop environments to pick from
(including i3 and Sway) or avoid.</p>
<p>It was a little concerning that Anarchy Linux also intends to be its
own divergent Arch-based distribution, but in the meantime it still
offers support for installing vanilla Arch.</p>
<p>It worked.</p>
<h3 id="life-on-arch">Life on Arch</h3><p>I copied over all my configs from my FreeBSD setup and they all
worked. That's pretty nice (also speaks to the general compatibility
of software between Linux and FreeBSD). StumpWM, SLiM, scrot,
xscreensaver, feh, Emacs, Tmux, ssh, kubectl, font settings,
keyboarding bindings, etc.</p>
<p>Getting Powerline working was a little weird. The <code>powerline</code> and
<code>powerline-fonts</code> packages don't seem to install patched fonts
(e.g. <code>Noto Sans for Powerline</code>). I prefer to use these than the
alternative of specifying multiple fonts for fallbacks because I have
font settings in multiple places (e.g. .Xresources, .emacs, etc) and
the syntax varies in each config. So ultimately I cloned the
<code>github.com/powerline/fonts</code> repo and ran the <code>install.sh</code> script
there to get the patched fonts.</p>
<p>But hey, there's a Spotify client! It works! And the headphone jack
just works after installing <code>alsa-utils</code> and running <code>alsamixer</code>. And
my WiFi speed is 120Mbps-250Mbps down on all the right networks!</p>
<p>I can live with this.</p>
<h3 id="random-background">Random background</h3><p>Each time I join a new company, I try to use the change as an excuse
to force myself to try different workflows and learn something new
tangential to the work I actually do. I'd been a Vim and Ubuntu
desktop user since highschool. In 2015, I took a break from work on
the East Coast to live in a school bus in Silver City, New Mexico. I
swapped out my Ubuntu and Vim dev setup for FreeBSD and Emacs. I kept
GNOME 3 because I liked the asthetic. I spent 6 months with this setup
forcing myself to use it as my daily-driver doing full-stack, contract
web development gigs.</p>
<p>In 2016, I joined Linode and took up the company Macbook Pro. I wasn't
as comfortable at the time running Linux on my Macbook, but a
determined coworker put Arch on his. I was still the only one
running Emacs (everyone else used Vim or VS Code) for Python and React
development.</p>
<p>I joined Capsule8 in late 2017 and put Gentoo on my Dell Developer
Edition. Most people ran Ubuntu on the Dell or macOS. I'd never used
Gentoo on a desktop before but liked the systemd-optional design and
similarities to FreeBSD. I ran Gentoo for 3-4 months but was
constantly breaking it during upgrades, and the monthly, full-system
upgrades themselves took 1-2 days. I didn't have the chops or patience
to deal with it.</p>
<p>So I used FreeBSD for 5 months and now I'm back on Linux.</p>
http://notes.eatonphil.com/finishing-up-a-freebsd-experiment.htmlSat, 28 Apr 2018 00:00:00 +0000
- Book Review: ANSI Common Lisphttp://notes.eatonphil.com/book-review-ansi-common-lisp.html<h4 id="score:-4.5-/-5">Score: 4.5 / 5</h4><p>Paul Graham and his editor(s) are excellent. His prose is light and
easy to follow. The only awkward component of the book's organization
is that he tends to use a concept one section before explicitly
introducing and defining that concept. I'm not sure yet if this is a
good or bad thing.</p>
<h3 id="as-a-learning-resource">As a learning resource</h3><p>Among books recommended to potential Lispers, <em>ANSI Common Lisp</em> is
typically written off. Graham's style of Lisp is called
"non-idiomatic". That's fair, both <em>ANSI Common Lisp</em> and <em>On Lisp</em>
feature aspects of Common Lisp that lend themselves to functional
programming. And as those of you who've read <em>Practical Common Lisp</em>
know, Common Lisp (unlike Scheme) was not designed to be a functional
programming language. Ultimately <em>ANSI Common Lisp</em> covers the same
topics <em>Practical Common Lisp</em> does, if not more. But <em>ANSI Common
Lisp</em> is better written, in less space, and with shorter examples.</p>
<p>I'm impressed at Graham's ability to summarize. There is a graphic
illustrating symbols as a structure composed of a name, a value, a
function, a package, and a property list. Although other resources
(books and otherwise) mention symbols as having one or more of these
components, his graphic was the first representation that clicked for
me. He also provides clarity about packages being namespaces for
<em>names</em> (symbols) not objects or functions.</p>
<p>And toward the end of the book, there is a discussion on the
"instance" abstraction (relative to the class definitions themselves)
being more powerful than plain "objects" that carry around methods
themselves. This has been the single most useful discussion on the
implementation of object-oriented constructs I've read yet.</p>
<h3 id="digression-on-practical-common-lisp">Digression on Practical Common Lisp</h3><p><em>Practical Common Lisp</em> is often called the best introduction to
Common Lisp. After reading both, I'd give <em>Practical Common Lisp</em>
second place or call it a tie. The issue with <em>Practical Common Lisp</em>
is that it takes too long to get anywhere and the practical chapters
themselves are just as much a slog. And for as big as it is,
<em>Practical Common Lisp</em> still doesn't include some major (potentially
confusing) aspects of "modern" Common Lisp like ASDF, Quicklisp,
production deployment strategies, etc.</p>
<p>Even after having read <em>Practical Common Lisp</em> I wasn't really clear
how to pull together all the libraries I needed to get anything real
done (e.g. scripting against an HTTP API or interacting with a SQL
database). This is not to say that <em>Practical Common Lisp</em> is a bad
book, it is a good book. But I definitely don't recommend reading it
without also reading <em>ANSI Common Lisp</em>. And regardless, there are
still a few of those modern concepts neither book covers.</p>
http://notes.eatonphil.com/book-review-ansi-common-lisp.htmlSun, 25 Mar 2018 00:00:00 +0000
- Starting a minimal Common Lisp projecthttp://notes.eatonphil.com/starting-a-minimal-common-lisp-project.html<p>If you've only vaguely heard of Lisp before or studied Scheme in
school, Common Lisp is nothing like what you'd expect. While
functional programming is all the rage in Scheme, Common Lisp was
"expressly designed to be a real-world engineering language rather
than a theoretically 'pure' language" (<a href="http://www.gigamonkeys.com/book/introduction-why-lisp.html">Practical Common
Lisp</a>).
Furthermore, <a href="http://sbcl.org/">SBCL</a> -- a popular implementation --
is a highly optimized compiler that is competitive with
<a href="https://benchmarksgame.alioth.debian.org/u64q/lisp.html">Java</a>.</p>
<h3 id="building-blocks">Building blocks</h3><p>Common Lisp symbols, imagine "first-class" variables/labels, are
encapsulated in namespaces called packages. However packages don't
account for organization across directories, among other things. So
while packages are a part of the core Common Lisp language, the
"cross-directory" organizational structure is managed by the
(all-but-standard) <a href="https://github.com/fare/asdf">ASDF</a> "systems". You
can think of packages as roughly similar to modules in Python whereas
systems in ASDF are more like packages in Python.</p>
<p>ASDF does not manage non-local dependencies. For that we use
<a href="https://www.quicklisp.org/beta/">Quicklisp</a>, the defacto package
manager. ASDF should come bundled with your Common Lisp installation,
which I'll assume is SBCL (not that it matters). Quicklisp does not
come bundled.</p>
<h3 id="getting-quicklisp">Getting Quicklisp</h3><p>You can follow the notes on the Quicklisp
<a href="https://www.quicklisp.org/beta/">site</a> for installation, but the
basic gist is:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-O<span class="w"> </span>https://beta.quicklisp.org/quicklisp.lisp
$<span class="w"> </span>sbcl<span class="w"> </span>--load<span class="w"> </span>quicklisp.lisp
...
*<span class="w"> </span><span class="o">(</span>quicklisp-quickstart:install<span class="o">)</span>
...
*<span class="w"> </span>^D
$<span class="w"> </span>sbcl<span class="w"> </span>--load<span class="w"> </span><span class="s2">"~/quicklisp/setup.lisp"</span>
...
*<span class="w"> </span><span class="o">(</span>ql:add-to-init-file<span class="o">)</span>
</pre></div>
<h3 id="a-minimal-package">A minimal package</h3><p>Now we're ready to get started. Create a directory using the name of
the library you'd like to package. For instance, I'll create a
"cl-docker" directory for my Docker wrapper library. Then create a
file using the same name in the directory with the ".asd" suffix:</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>~/projects
$<span class="w"> </span>mkdir<span class="w"> </span>cl-docker
$<span class="w"> </span>touch<span class="w"> </span>cl-docker/cl-docker.asd
</pre></div>
<p>It is important for the ".asd" file to share the same name as the
directory because ASDF will look for it in that location (by default).</p>
<p>Before we get too far into packaging, let's write a function we'd like
to export from this library. Edit "cl-docker/docker.lisp" (this name does
not matter) and add the following:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">ps</span><span class="w"> </span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">((</span><span class="nv">output</span><span class="w"> </span><span class="p">(</span><span class="nv">uiop:run-program</span><span class="w"> </span><span class="o">'</span><span class="p">(</span><span class="s">"docker"</span><span class="w"> </span><span class="s">"ps"</span><span class="p">)</span><span class="w"> </span><span class="ss">:output</span><span class="w"> </span><span class="ss">:string</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nv">line</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">"(\\n+)"</span><span class="w"> </span><span class="nv">output</span><span class="p">))</span>
<span class="w"> </span><span class="nv">collect</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">"(\\s\\s+)"</span><span class="w"> </span><span class="nv">line</span><span class="p">))))</span>
</pre></div>
<p>This uses a portable library, "uiop", that ASDF exposes by default (we
won't need to explicitly import this anywhere because the package is
managed by ASDF). It will run the command "docker ps" in a subprocess
and return the output as a string. Then we use the regex split
function from the "cl-ppcre" library to split the output first into
lines, take all but the first line, and split the lines up based one
two or more whitespace characters.</p>
<p>Next let's define the package (think module in Python) by editing
"cl-docker/package.lisp" (this name also does not matter):</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defpackage</span><span class="w"> </span><span class="nv">cl-docker</span>
<span class="w"> </span><span class="p">(</span><span class="ss">:use</span><span class="w"> </span><span class="nv">cl</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="ss">:import-from</span><span class="w"> </span><span class="ss">:cl-ppcre</span><span class="w"> </span><span class="ss">:split</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="ss">:export</span><span class="w"> </span><span class="ss">:ps</span><span class="p">))</span>
</pre></div>
<p>Here we state the package's name, say that we want to import all
Common Lisp base symbols into the package, say we want to import the
"split" symbol from the "cl-ppcre" package, and say we only want to
export our "ps" function.</p>
<p>At this point we must also declare within the "cl-docker/docker.lisp"
file that it is a part of this package:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">in-package</span><span class="w"> </span><span class="ss">:cl-docker</span><span class="p">)</span>
<span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">ps</span><span class="w"> </span><span class="p">()</span>
<span class="w"> </span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">((</span><span class="nv">output</span><span class="w"> </span><span class="p">(</span><span class="nv">uiop:run-program</span><span class="w"> </span><span class="o">'</span><span class="p">(</span><span class="s">"docker"</span><span class="w"> </span><span class="s">"ps"</span><span class="p">)</span><span class="w"> </span><span class="ss">:output</span><span class="w"> </span><span class="ss">:string</span><span class="p">)))</span>
<span class="w"> </span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nv">line</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">"(\\n+)"</span><span class="w"> </span><span class="nv">output</span><span class="p">))</span>
<span class="w"> </span><span class="nv">collect</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">"(\\s\\s+)"</span><span class="w"> </span><span class="nv">line</span><span class="p">))))</span>
</pre></div>
<p>Next let's define the system (ASDF-level, similar to a package in Python)
in "cl-docker/cl-docker.asd":</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">defsystem</span><span class="w"> </span><span class="ss">:cl-docker</span>
<span class="w"> </span><span class="ss">:depends-on</span><span class="w"> </span><span class="p">(</span><span class="ss">:cl-ppcre</span><span class="p">)</span>
<span class="w"> </span><span class="ss">:serial</span><span class="w"> </span><span class="no">t</span>
<span class="w"> </span><span class="ss">:components</span><span class="w"> </span><span class="p">((</span><span class="ss">:file</span><span class="w"> </span><span class="s">"package"</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="ss">:file</span><span class="w"> </span><span class="s">"docker"</span><span class="p">)))</span>
</pre></div>
<p>This defines all the pieces of the system for ASDF: the system name,
the package definition and the component of the package
("cl-docker/docker.lisp"), and tells ASDF to make the "cl-ppcre"
system on disk available to us. We also tell ASDF to process the
components in the order we specified (otherwise it will pick an order
that may not be what we want).</p>
<p>In preparation for times when we don't have the "cl-ppcre" system (or
any other dependencies) on disk, we always load the system indirectly
through Quicklisp (rather than directly via ASDF) so Quicklisp can
fetch any missing dependencies from its repository of systems.</p>
<p>But before then -- unless you put this directory in "~/common-lisp" --
you'll need to register the directory containing the directory of your
system definitions so ASDF (and Quicklisp) know where to look if you
ask to load this system.</p>
<p>To do this, add a ".conf" file to
"~/.config/common-lisp/source-registry.conf.d/" and add the following:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="ss">:tree</span><span class="w"> </span><span class="s">"~/path/to/dir/containing/system/dir"</span><span class="p">)</span>
</pre></div>
<p>So if you had a repo called "cl-docker" in your "~/projects" directory
that contained the "cl-docker" directory we previously created (that,
in turn, contains the "cl-docker.asd", "package.lisp", and
"docker.lisp" files) then you might create
"~/.config/common-lisp/source-registry.conf.d/1-cl-docker.conf" and
add:</p>
<div class="highlight"><pre><span></span><span class="p">(</span><span class="ss">:tree</span><span class="w"> </span><span class="s">"~/projects/cl-docker"</span><span class="p">)</span>
</pre></div>
<h4 id="using-the-system">Using the system</h4><p>Now you can use the library from anywhere on your computer. Enter a
Common Lisp REPL and tell Quicklisp to load the system (and download
any non-local dependencies):</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>sbcl
...
*<span class="w"> </span><span class="o">(</span>ql:quickload<span class="w"> </span><span class="s2">"cl-docker"</span><span class="o">)</span>
To<span class="w"> </span>load<span class="w"> </span><span class="s2">"cl-docker"</span>:
<span class="w"> </span>Load<span class="w"> </span><span class="m">1</span><span class="w"> </span>ASDF<span class="w"> </span>system:
<span class="w"> </span>cl-docker
<span class="p">;</span><span class="w"> </span>Loading<span class="w"> </span><span class="s2">"cl-docker"</span>
..................................................
<span class="o">[</span>package<span class="w"> </span>cl-docker<span class="o">]</span>
<span class="o">(</span><span class="s2">"cl-docker"</span><span class="o">)</span>
*<span class="w"> </span><span class="o">(</span>cl-docker:ps<span class="o">)</span>
</pre></div>
<p>And that's it!</p>
<p>For the complete source of this example package, check out this
<a href="https://gist.github.com/eatonphil/59cdfeb4826c7a12a07d7055f6817a56">Gist</a>.</p>
<h3 id="in-conclusion">In conclusion</h3><p>Common Lisp is easy to work with, the packages are many and mature.
Configuring an ASDF package is even simpler than configuring a Python
"setup.py". I didn't demonstrate pinning versions of dependencies in
ASDF, but <a href="https://stackoverflow.com/a/21663603/1507139">of course</a>
you can do that too. If any of this -- as simple as it is -- seems
tedious, you can also use Zach Beane's (creator of Quicklisp)
<a href="http://xach.livejournal.com/278047.html">quickproject</a> tool to build
out the structure for you.</p>
<h3 id="resources-for-common-lisp">Resources for Common Lisp</h3><p>You must read <a href="http://www.gigamonkeys.com/book/">Practical Common
Lisp</a>. It is freely available
online. It is one of the best resources I keep referring to in
dealing with simple issues (as a new Lisper, I stumble on a lot of
simple issues).</p>
<p>Paul Graham's <a href="http://www.paulgraham.com/onlisp.html">On Lisp</a> is also
a must-read when you want to get a better understanding of macros in
Lisp. It will help you out with macros in Scheme too. This book is
freely available online, but out of print physically. I sent
<a href="https://www.lulu.com/">Lulu</a> the PDF and I received my physical copy
for under $20 (including shipping).</p>
<p>I'm currently making my way through <a href="http://www.cs.cmu.edu/Groups/AI/html/cltl/cltl2.html">Common Lisp the Language, 2nd
Edition</a> which I
believe is also freely available online. However I don't really
recommend this unless you are interested in implementing Common Lisp
or are dying to learn the standard library (not a bad idea).</p>
<p>Finally, Peter Norvig's <a href="https://github.com/norvig/paip-lisp">Paradigms of Artificial Intelligence
Programming</a> just recently became
freely available online. I haven't yet read it but I'm queuing it
up. Don't let the title scare you, apparantly it is primarily
considered a practical guide to Common Lisp around
old-school/classical AI that isn't supposed to encumber.</p>
<p class="note">
It
was <a href="https://twitter.com/HexstreamSoft/status/971419419862847494">pointed
out</a> on Twitter that Paul
Graham's <a href="http://www.paulgraham.com/acl.html">ANSI Common
Lisp</a> and the
<a href="http://www.lispworks.com/documentation/lw70/CLHS/Front/Contents.htm">CLHS</a>
are probably better resources for the Common Lisp that exists today
than Common Lisp the Language 2. CLtL2 is pre-standard.
</p><p>Additionally, the <a href="http://lispcookbook.github.io/cl-cookbook/">Common Lisp
Cookbook</a> is a great
resource for Common Lisp recipes. It's been around since 2004 (on
Sourceforge) but has been pretty active recently and has been revived
on Github pages.</p>
<h3 id="on-scheme">On Scheme</h3><p>I've done one or two unremarkable web prototypes in <a href="https://www.call-cc.org/">Chicken
Scheme</a>, an R5RS/R7RS Scheme implementation.
I don't think Chicken Scheme is the best bet for the web (I'm mostly
biased to this topic) because it has no native-thread support and
there are lighter interpreters out there that are easier to embed
(e.g. in nginx). Chicken Scheme's "niche" is being a generally
high-quality implementation with a great <a href="http://wiki.call-cc.org/chicken-projects/egg-index-4.html">collection of 3rd-party
libraries</a>,
but it is also not the
<a href="https://ecraven.github.io/r7rs-benchmarks/">fastest</a> Scheme you could
choose.</p>
<p>I've worked on a larger web prototype -- a Github issue reporting app
-- in <a href="https://racket-lang.org/">Racket</a>, a derivative of Scheme
R6RS. And I've blogged
<a href="http://notes.eatonphil.com/walking-through-a-basic-racket-web-service.html">favorably</a>
about Racket. It is a
<a href="https://ecraven.github.io/r7rs-benchmarks/">high-performance</a>
interpreter with a JIT compiler, has thread support, and is also well
known for its collection of <a href="https://pkgs.racket-lang.org/">3rd-party
libaries</a>. However the Racket ecosystem
<a href="https://fare.livejournal.com/188429.html">suffers</a> from the same
issues Haskell's does: libraries and bindings are primarily
proof-of-concept only; missing documentation, tests and use. Trying to
render "templatized" HTML (like Jinja allows for in Flask) without
using S-exp-based syntax was a nightmare. (Read: there's space for
someone to write a good string templating library.)</p>
<h4 id="sorry,-racket">Sorry, Racket</h4><p>Last point on Racket (because it really is worth looking into),
debugging in that Github issue project was not fun. The backtraces
were mostly useless. Naively I assume this may have to do with the way
Racket optimizes and rewrites functions. I was often left with zero
context to find and correct my errors. But it could very well be I
was making poor use of Racket.</p>
<h4 id="on-the-other-hand">On the other hand</h4><p>Common Lisp (its implementations and ecosystem) seems more robust and
developed. SBCL, with it's great performance and native-thread
support, is a promising candidate for backend web development.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a post on putting together a Common Lisp project. It's easy! I also included some of my favorite CL books and a digression on Scheme. <a href="https://t.co/2LEDoFnAjk">https://t.co/2LEDoFnAjk</a> <a href="https://twitter.com/hashtag/commonlisp?src=hash&ref_src=twsrc%5Etfw">#commonlisp</a> <a href="https://twitter.com/hashtag/lisp?src=hash&ref_src=twsrc%5Etfw">#lisp</a> <a href="https://twitter.com/hashtag/scheme?src=hash&ref_src=twsrc%5Etfw">#scheme</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/971398435856371712?ref_src=twsrc%5Etfw">March 7, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/starting-a-minimal-common-lisp-project.htmlMon, 05 Mar 2018 00:00:00 +0000
- Interview with the D Language Blog: BSDSchemehttp://notes.eatonphil.com/project-highlight-bsdscheme.html<head>
<meta http-equiv="refresh" content="4;URL='https://dlang.org/blog/2018/01/20/project-highlight-bsdscheme/'" />
</head><p>This is an external post of mine. Click
<a href="https://dlang.org/blog/2018/01/20/project-highlight-bsdscheme/">here</a>
if you are not redirected.</p>
http://notes.eatonphil.com/project-highlight-bsdscheme.htmlSat, 20 Jan 2018 00:00:00 +0000
- First few hurdles writing a Scheme interpreterhttp://notes.eatonphil.com/first-few-hurdles-writing-a-scheme-interpreter.html<p>I started working on <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a> last October, inspired to get back
into language implementation after my coworker built <a href="https://github.com/briansteffens/bshift">bshift</a>, a
compiler for a C-like language. BSDScheme is an interpreter for a
(currently small subset of) Scheme written in D. It implements a few
substantial primitive <a href="https://github.com/eatonphil/bsdscheme/blob/c49bb14182f04682a5cda4dd224b853b4fc92e92/src/runtime.d#L422">functions</a> (in under 1000 LoC!). It uses the
same test framework bshift uses, <a href="https://github.com/briansteffens/btest">btest</a>. I'm going to expand here
on some notes I wrote in a <a href="https://www.reddit.com/r/scheme/comments/7nvd1y/my_small_scheme_implementation_in_d/">post</a> on Reddit on some issues I faced
during these first few months developing BSDSCheme.</p>
<p>Before I get too far, here is a simple exponent function running in
BSDScheme. It demonstates a few of the basic builtin primitives and
also integers being upgraded to D's <a href="https://dlang.org/phobos/std_bigint.html">std.bigint</a> when an integer
operation produces an integer unable to fit in 64 bits. (See the
<a href="https://github.com/eatonphil/bsdscheme/blob/b202e8b5a24fe4281a06e39241f2be3cd51720fc/src/runtime.d#L99">times</a> and <a href="https://github.com/eatonphil/bsdscheme/blob/b202e8b5a24fe4281a06e39241f2be3cd51720fc/src/runtime.d#L63">plus</a> guards for details; see the <a href="https://github.com/eatonphil/bsdscheme/tree/master/examples">examples</a>
directory for other examples.)</p>
<div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>examples/recursion.scm
<span class="o">(</span>define<span class="w"> </span><span class="o">(</span>exp<span class="w"> </span>base<span class="w"> </span>pow<span class="o">)</span>
<span class="w"> </span><span class="o">(</span><span class="k">if</span><span class="w"> </span><span class="o">(=</span><span class="w"> </span>pow<span class="w"> </span><span class="m">0</span><span class="o">)</span>
<span class="w"> </span><span class="m">1</span>
<span class="w"> </span><span class="o">(</span>*<span class="w"> </span>base<span class="w"> </span><span class="o">(</span>exp<span class="w"> </span>base<span class="w"> </span><span class="o">(</span>-<span class="w"> </span>pow<span class="w"> </span><span class="m">1</span><span class="o">)))))</span>
<span class="o">(</span>display<span class="w"> </span><span class="o">(</span>exp<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">64</span><span class="o">))</span>
<span class="o">(</span>newline<span class="o">)</span>
$<span class="w"> </span>./bin/bsdscheme<span class="w"> </span>examples/exp.scm
<span class="m">18446744073709551616</span>
</pre></div>
<p>The first big correction I made was to the way values are represented
in memory. I originally implemented BSDScheme's value representation
as a <a href="https://github.com/eatonphil/bsdscheme/pull/3/files#diff-653d5ccdaa287f13a3b2d964da52ab4aL284">struct</a> with a pointer to each possible value type. This
design was simple to begin with but space-inefficient. I modelled a
<a href="https://github.com/eatonphil/bsdscheme/pull/3">redesign</a> after the <a href="https://wiki.call-cc.org/man/4/Data%20representation">Chicken Scheme</a> data representation. It
uses a struct with <a href="https://github.com/eatonphil/bsdscheme/pull/3/files#diff-c586618fe7ea7c64340046e89fd82621R14">two fields</a>, header and data. Both fields are
word-size integers (currently hard-coded as 64 bits). The header
stores type and length information and the data stores data.</p>
<p>In this representation, simple types (integers < 2^63, booleans,
characters, etc.) take up only 128 bits. The integers, booleans, etc.
are placed directly into the 64 bit data field. Other types (larger
integers, strings, functions, etc) use the data field to store a
pointer to memory allocated in the heap. Getting the conversion of
these complex types right was the trickiest part of this data
representation effort... lots of void-pointer conversions.</p>
<p>The next big fix I made was to simplify the way generic functions
dealt with their arguments. Originally I passed each function its
arguments un-evaluated and left it up to each function to evaluate its
arguments before operating on them. While there was nothing
intrinsically wrong with this method, it was overly complicated and
bug-prone. I refactored the builtin functions into two groups:
<a href="https://github.com/eatonphil/bsdscheme/blob/c49bb14182f04682a5cda4dd224b853b4fc92e92/src/runtime.d#L422">normal</a> functions and <a href="https://github.com/eatonphil/bsdscheme/blob/c3286df73a32da657e780db8f33e845c9f806a9d/src/runtime.d#L435">special</a> functions. Normal function
arguments are <a href="https://github.com/eatonphil/bsdscheme/blob/c3286df73a32da657e780db8f33e845c9f806a9d/src/runtime.d#L399">evaluated</a> before sending the arguments S-expression
to the function. Special functions receive the arguments S-expression
verbatim so they can decide what / when to evaluate.</p>
<p>The last issue I'll talk about in this post was dealing with the AST
representation. When I started out, the easiest way to get things
working was to have an AST representation completely separate from the
representation of BSDScheme values. This won't get you far in
Scheme. In order to (eventually) support macros (and in the meantime
support eval), the AST representation would have to make use of the
value representation. This was the most complicated and confusing
issue so far in BSDScheme. With the switch to recursive data
structures, it was hard to know if an error occurred because I parsed
incorrectly, or recursed over what I parsed incorrectly, or even if I
was printing out what I parsed incorrectly. After some embarrassing
pain, I got all the <a href="https://github.com/eatonphil/bsdscheme/pull/5">pieces in place</a> after a month and it set me
up to easily support converting my original interpret function into a
generic eval function that I could expose to the language like any
other special function.</p>
<p>One frustrating side-effect of this AST conversion is that since the
parsing stage builds out trees using the internal value
representation, the parsing stage is tied to the interpreter. From
what I can tell, this basically means I have to revert back to some
intermediate AST representation or throw away the parser to support a
compiler backend.</p>
<p>Next steps in BSDScheme include converting all the examples into
tests, combining the needlessly split out lexing and parsing stage
into a single read function that can be exposed into the language,
fleshing out R7RS library support, and looking more into LLVM as a
backend.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a full post on the first few hurdles faced writing a Scheme interpreter in D <a href="https://t.co/Cyjy7pk3OB">https://t.co/Cyjy7pk3OB</a> <a href="https://twitter.com/hashtag/scheme?src=hash&ref_src=twsrc%5Etfw">#scheme</a> <a href="https://twitter.com/hashtag/schemelang?src=hash&ref_src=twsrc%5Etfw">#schemelang</a> <a href="https://twitter.com/hashtag/lisp?src=hash&ref_src=twsrc%5Etfw">#lisp</a> <a href="https://twitter.com/hashtag/dlang?src=hash&ref_src=twsrc%5Etfw">#dlang</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/951091952740651008?ref_src=twsrc%5Etfw">January 10, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/first-few-hurdles-writing-a-scheme-interpreter.htmlWed, 10 Jan 2018 00:00:00 +0000
- Deploying FreeBSD on Linode unattended in minuteshttp://notes.eatonphil.com/deploying-freebsd-on-linode-unattended-in-minutes.html<p>I became a FreeBSD user over 2 years ago when I wanted to see what all
the fuss was about. I swapped my y410p dual-booting Windows / Ubuntu
with FreeBSD running Gnome 3. I learned a lot during the transition
and came to appreciate FreeBSD as a user. I soon began running FreeBSD
as my OS of choice on cloud servers I managed. So naturally, when I
started working at Linode a year ago I wanted to run FreeBSD servers
on Linode too.</p>
<p>Linode is a great platform for running random unofficial images
because you have much control over the configuration. I followed
<a href="https://www.linode.com/docs/tools-reference/custom-kernels-distros/install-freebsd-on-linode/">existing</a> <a href="https://forum.linode.com/viewtopic.php?f=20&t=12080">guides</a> closely and was soon able to get a number of
operating systems running on Linodes by installing them manually:
FreeBSD, OpenBSD, NetBSD, Minix3, and SmartOS to date.</p>
<p>Unofficial images come at a cost though. In particular, I became
frustrated having to reinstall using the installer every time I
managed to trash the disk. So over the past year, I spent time trying
to understand the automated installation processes across different
operating systems and Linux distributions.</p>
<p>Unattended installations are tough. The methods for doing them differ
wildly. On RedHat, Fedora, and CentOS there is <a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/installation_guide/ch-kickstart2">Kickstart</a>. On
Debian and Ubuntu there is <a href="https://wiki.debian.org/DebianInstaller/Preseed">preseeding</a>. Gentoo, Arch, and FreeBSD
don't particularly have a framework for unattended installs, but the
entire installation process is well-documented and inherently
scriptable (if you put in the effort). OpenBSD has
<a href="http://man.openbsd.org/OpenBSD-6.0/man8/autoinstall.8">autoinstall</a>. Trying to understand each and every one of these
potential installation methods was pretty defeating for getting
started on a side-project.</p>
<p>A few weeks ago, I finally had the silly revelation that I didn't need
to script the installation process -- at least initially. I only had
to have working images available somewhere that could be copied to new
Linodes. Some OSs / distributions may provide these images, but there
is no guarantee that they exist or work. If I tested and hosted them
for Linodes, anyone could easily run their own copy.</p>
<p>I began by running the installation process as normal for
FreeBSD. After the disk had FreeBSD installed on it, I rebooted into
<a href="https://www.linode.com/docs/troubleshooting/rescue-and-rebuild/">Finnix</a>, <a href="https://wiki.archlinux.org/index.php/disk_cloning#Create_disk_image">made a compressed disk image</a>, and transferred it to
an "image host" (another Linode in Fremont running an FTP
server). Then I tested the reversal process manually to make sure a
new Linode could grab the image, dd it to a disk, reboot and have a
working filesystem and networking. (This transfer occurs over private
networking to reduce bandwidth costs and thus limits Linode creation
to the datacenter of the image host, Fremont.)</p>
<p>Then it was time to script the process. I looked into the existing
Linode API client wrappers and noticed none of them were
documented. So I took a day to write and document a good part of a
<a href="https://github.com/eatonphil/python3-linode_api3">new Linode Python client</a>.</p>
<p>I got to work and out came the <a href="https://github.com/eatonphil/linode_deploy_experimental">linode-deploy-experimental</a>
script. To run this script, you'll need an <a href="https://www.linode.com/docs/platform/api/api-key/">API token</a>. This
script will allow you to deploy from the hosted images (which now
include FreeBSD 11.0 and OpenBSD 6.0). Follow the example line in the
git repo and you'll have a Linode running OpenBSD or FreeBSD in
minutes.</p>
<p>Clearly there's a lot of work to do on both this script and on the
images:</p>
<ul>
<li>Fremont datacenter has the only image host.</li>
<li>The script does not change the default password: "password123".
You'll want to change this immediately.</li>
<li>The script does not automatically grow the file system after
install.</li>
<li>The TTY config for these images currently requires you to use
Glish instead of Weblish.</li>
<li>And <a href="https://github.com/eatonphil/linode_deploy_experimental/issues">more</a>.</li>
</ul>
<p>Even if many of these issues do get sorted out (I assume they will),
keep in mind that these are unofficial, unsupported images. Some
things will probably never work: backups, password reset, etc. If you
need help, you are probably limited to community support. You can also
find me with any questions (peaton on OFTC). But for me this is at
least a slight improvement on having to run through the install
process every time I need a new FreeBSD Linode.</p>
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Deploy FreeBSD and OpenBSD unattended on Linode <a href="https://t.co/j5A46ROqNM">https://t.co/j5A46ROqNM</a> <a href="https://t.co/HSqrIvBMFj">https://t.co/HSqrIvBMFj</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/840736360864591872?ref_src=twsrc%5Etfw">March 12, 2017</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/deploying-freebsd-on-linode-unattended-in-minutes.htmlSat, 11 Mar 2017 00:00:00 +0000
- Walking through a basic Racket web servicehttp://notes.eatonphil.com/walking-through-a-basic-racket-web-service.html<p>Racket is an impressive language and ecosystem. Compared to Python,
Racket (an evolution of Scheme <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">R5RS</a> is three years younger. It is
as concise and expressive as Python but with much more reasonable
syntax and semantics. Racket is also faster in many cases due in part
to:</p>
<ul>
<li><a href="https://docs.racket-lang.org/guide/performance.html#%28part._.J.I.T%29">JIT compilation</a> on x86 platforms</li>
<li>support for both
<a href="https://docs.racket-lang.org/reference/threads.html">concurrency</a> and <a href="https://docs.racket-lang.org/reference/places.html">parallelism</a></li>
<li>support for <a href="https://docs.racket-lang.org/ts-guide/optimization.html">optimizing</a> statically-typed code</li>
</ul>
<p>Furthermore, the built-in web server libraries <strong>and</strong> database
drivers for MySQL and PostgreSQL are fully asynchronous. This last bit
drove me here from <a href="https://www.playframework.com/documentation/2.6.x/ThreadPools#Knowing-when-you-are-blocking">Play / Akka</a>. (But strong reservations about
the complexity of Scala and the ugliness of Play in Java helped too.)</p>
<p>With this motivation in mind, I'm going to break down the simple web
service <a href="https://docs.racket-lang.org/web-server/stateless.html#%28part._stateless-example%29">example</a> provided in the Racket manuals. If you don't see
the following code in the linked page immediately, scroll down a bit.</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">web-server</span>
<span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="nv">web-server/http</span><span class="p">)</span>
<span class="p">(</span><span class="nb">provide</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="nv">stuffer</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="ss">'stateless</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">stuffer</span>
<span class="w"> </span><span class="p">(</span><span class="nf">stuffer-chain</span>
<span class="w"> </span><span class="nv">serialize-stuffer</span>
<span class="w"> </span><span class="p">(</span><span class="nf">md5-stuffer</span><span class="w"> </span><span class="p">(</span><span class="nf">build-path</span><span class="w"> </span><span class="p">(</span><span class="nf">find-system-path</span><span class="w"> </span><span class="ss">'home-dir</span><span class="p">)</span><span class="w"> </span><span class="s">".urls"</span><span class="p">))))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Look ma, no state!"</span><span class="p">)))))</span>
</pre></div>
<p>First we notice the #lang declaration. Racket libraries love to make
new "languages". These languages can include some entirely new syntax
(like the <a href="http://docs.racket-lang.org/algol60/">Algol language implementation</a>) or can simply include a
summary collection of libraries and alternative program entrypoints
(such as this web-server language provides). So the first thing we'll
do to really understand this code is to throw out the custom
language. And while we're at it, we'll throw out all typical imports
provided by the <a href="http://docs.racket-lang.org/reference/">default racket language</a> and use the racket/base
language instead. This will help us get a better understanding of the
Racket libraries and the functions we're using from these libraries.</p>
<p>While we're throwing the language away, we notice the paragraphs just
below that <a href="https://docs.racket-lang.org/web-server/stateless.html#%28part._stateless-example%29">original example</a> in the manual. It mentions that the
web-server language also imports a bunch of modules. We can discover
which of these modules we actually need by searching in the Racket
manual for functions we've used. For instance, <a href="https://docs.racket-lang.org/search/index.html?q=response%2Fxexpr">searching</a> for
"response/xexpr" tells us it's in the <a href="https://docs.racket-lang.org/web-server/http.html#%28part._xexpr%29">web-server/http/xexpr</a>
module. We'll import the modules we need using the "prefix-in" form to
make function-module connections explicit.</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span>
<span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">hash:</span><span class="w"> </span><span class="nv">web-server/stuffers/hash</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">stuffer:</span><span class="w"> </span><span class="nv">web-server/stuffers/stuffer</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">serialize:</span><span class="w"> </span><span class="nv">web-server/stuffers/serialize</span><span class="p">))</span>
<span class="p">(</span><span class="nb">provide</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="nv">stuffer</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="ss">'stateless</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">stuffer</span>
<span class="w"> </span><span class="p">(</span><span class="nf">stuffer:stuffer-chain</span>
<span class="w"> </span><span class="nv">serialize:serialize-stuffer</span>
<span class="w"> </span><span class="p">(</span><span class="nf">hash:md5-stuffer</span><span class="w"> </span><span class="p">(</span><span class="nf">build-path</span><span class="w"> </span><span class="p">(</span><span class="nf">find-system-path</span><span class="w"> </span><span class="ss">'home-dir</span><span class="p">)</span><span class="w"> </span><span class="s">".urls"</span><span class="p">))))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Look ma, no state!"</span><span class="p">)))))</span>
</pre></div>
<p>Now we've got something that is a little less magical. We can run this
file by calling it: "racket server.rkt". But nothing happens. This is
because the web-server language would start the service itself using
the exported variables we provided. So we're going to have to figure
out what underlying function calls "start" and call it
ourselves. Unfortunately searching for "start" in the manual search
field yields nothing relevant. So we Google "racket web server
start". Down the page on the second <a href="https://docs.racket-lang.org/web-server/run.html">search result</a> we notice an
<a href="https://docs.racket-lang.org/web-server/run.html#%28part._.Examples%29">example</a> using the serve/servlet function to register the start
function. This is our in.</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span>
<span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">hash:</span><span class="w"> </span><span class="nv">web-server/stuffers/hash</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">stuffer:</span><span class="w"> </span><span class="nv">web-server/stuffers/stuffer</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">serialize:</span><span class="w"> </span><span class="nv">web-server/stuffers/serialize</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet-env:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span>
<span class="p">(</span><span class="nb">provide</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="nv">stuffer</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="ss">'stateless</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">stuffer</span>
<span class="w"> </span><span class="p">(</span><span class="nf">stuffer:stuffer-chain</span>
<span class="w"> </span><span class="nv">serialize:serialize-stuffer</span>
<span class="w"> </span><span class="p">(</span><span class="nf">hash:md5-stuffer</span><span class="w"> </span><span class="p">(</span><span class="nf">build-path</span><span class="w"> </span><span class="p">(</span><span class="nf">find-system-path</span><span class="w"> </span><span class="ss">'home-dir</span><span class="p">)</span><span class="w"> </span><span class="s">".urls"</span><span class="p">))))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Look ma, no state!"</span><span class="p">)))))</span>
<span class="p">(</span><span class="nf">servlet-env:serve/servlet</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span>
</pre></div>
<p>Run this version and it works! We are directed to a browser with our
HTML. But we should clean this code up a bit. We no longer need to
export anything so we'll drop the provide line. We aren't even using
the interface-version and stuffer code. Things seem to be fine without
them, so we'll drop those too. Also, looking at the serve/servlet
<a href="https://docs.racket-lang.org/web-server/run.html#%28def._%28%28lib._web-server%2Fservlet-env..rkt%29._serve%2Fservlet%29%29">documentation</a> we notice some other nice arguments we can tack
on.</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span>
<span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet-env:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Look ma, no state!"</span><span class="p">)))))</span>
<span class="p">(</span><span class="nf">servlet-env:serve/servlet</span>
<span class="w"> </span><span class="nv">start</span>
<span class="w"> </span><span class="kd">#:servlet-path</span><span class="w"> </span><span class="s">"/"</span>
<span class="w"> </span><span class="kd">#:servlet-regexp</span><span class="w"> </span><span class="nv">rx</span><span class="s">""</span>
<span class="w"> </span><span class="kd">#:stateless?</span><span class="w"> </span><span class="no">#t</span><span class="p">)</span>
</pre></div>
<p>Ah, that's much cleaner. When you run this code, you will no longer be
directed to the /servlets/standalone.rkt path but to the site root --
set by the #:servlet-path optional variable. Also, every other path
you try to reach such as /foobar will successfully map to the start
function -- set by the #:servlet-regexp optional variable. Finally, we
also found the configuration to set the servlet stateless -- set by
the optional variable #:stateless?.</p>
<p>But this is missing two things we could really use out of a simple web
service. The first is routing. We do that by looking up the
documentation for the <a href="https://docs.racket-lang.org/web-server/dispatch.html">web-server/dispatch</a> module. We'll use this
module to define some routes -- adding a 404 route to demonstrate the
usage.</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span>
<span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">dispatch:</span><span class="w"> </span><span class="nv">web-server/dispatch</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">not-found-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Uh-oh! Page not found."</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">home-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Look ma, no state!!!!!!!!!"</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">define-values</span><span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch</span><span class="w"> </span><span class="nv">route-url</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">dispatch:dispatch-rules</span>
<span class="w"> </span><span class="p">[(</span><span class="s">""</span><span class="p">)</span><span class="w"> </span><span class="nv">home-route</span><span class="p">]</span>
<span class="w"> </span><span class="p">[</span><span class="k">else</span><span class="w"> </span><span class="nv">not-found-route</span><span class="p">]))</span>
<span class="p">(</span><span class="nf">servlet:serve/servlet</span>
<span class="w"> </span><span class="nv">route-dispatch</span>
<span class="w"> </span><span class="kd">#:servlet-path</span><span class="w"> </span><span class="s">"/"</span>
<span class="w"> </span><span class="kd">#:servlet-regexp</span><span class="w"> </span><span class="o">#</span><span class="nv">rx</span><span class="s">""</span>
<span class="w"> </span><span class="kd">#:stateless?</span><span class="w"> </span><span class="no">#t</span><span class="p">)</span>
</pre></div>
<p>Run this version and check out the server root. Then try any other
path. Looks good. The final missing piece to this simple web service
is logging. Thankfully, the <a href="https://docs.racket-lang.org/web-server-internal/dispatch-log.html">web-server/dispatch-log</a> module has
us covered with some request formatting functions. So we'll wrap the
route-dispatch function and we'll print out the formatted request.</p>
<div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span>
<span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">dispatch:</span><span class="w"> </span><span class="nv">web-server/dispatch</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">dispatch-log:</span><span class="w"> </span><span class="nv">web-server/dispatchers/dispatch-log</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">not-found-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Uh-oh! Page not found."</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">home-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span>
<span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">"Look ma, no state!!!!!!!!!"</span><span class="p">)))))</span>
<span class="p">(</span><span class="k">define-values</span><span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch</span><span class="w"> </span><span class="nv">route-url</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">dispatch:dispatch-rules</span>
<span class="w"> </span><span class="p">[(</span><span class="s">""</span><span class="p">)</span><span class="w"> </span><span class="nv">home-route</span><span class="p">]</span>
<span class="w"> </span><span class="p">[</span><span class="k">else</span><span class="w"> </span><span class="nv">not-found-route</span><span class="p">]))</span>
<span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch/log-middleware</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nb">display</span><span class="w"> </span><span class="p">(</span><span class="nf">dispatch-log:apache-default-format</span><span class="w"> </span><span class="nv">req</span><span class="p">))</span>
<span class="w"> </span><span class="p">(</span><span class="nf">flush-output</span><span class="p">)</span>
<span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch</span><span class="w"> </span><span class="nv">req</span><span class="p">))</span>
<span class="p">(</span><span class="nf">servlet:serve/servlet</span>
<span class="w"> </span><span class="nv">route-dispatch/log-middleware</span>
<span class="w"> </span><span class="kd">#:servlet-path</span><span class="w"> </span><span class="s">"/"</span>
<span class="w"> </span><span class="kd">#:servlet-regexp</span><span class="w"> </span><span class="o">#</span><span class="nv">rx</span><span class="s">""</span>
<span class="w"> </span><span class="kd">#:stateless?</span><span class="w"> </span><span class="no">#t</span><span class="p">)</span>
</pre></div>
<p>Run this version and notice the logs displayed for each request. Now
you've got a simple web service with routing and logging! I hope this
gives you a taste for how easy it is to build simple web services in
Racket without downloading any third-party libraries. Database drivers
and HTML template libraries are also included and similarly
well-documented. In the future I hope to add an example of a slightly
more advanced web service.</p>
<p class="note">
I have had huge difficulty discovering the source of Racket
libraries. These library sources are nearly impossible to Google
and search on Github is insane. Best scenario, the official
racket.org docs would link directly to the source of a function when
the function is documented. Of course I could just download the
Racket source and start grepping... but I'm only so interested.
</p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Walking through a basic Racket web service <a href="https://t.co/J3us48kzga">https://t.co/J3us48kzga</a> <a href="https://twitter.com/racketlang?ref_src=twsrc%5Etfw">@racketlang</a></p>— Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/814674473681121280?ref_src=twsrc%5Etfw">December 30, 2016</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p>
http://notes.eatonphil.com/walking-through-a-basic-racket-web-service.htmlThu, 29 Dec 2016 00:00:00 +0000